Genome Paper Figures: DMT analysis Version 2

Introduction

     This notebook summarizes the analysis that was done for the Aspidoscelis marmoratus genome publication. The analysis done for this project started in 2011 and spans many projects. Many of these side projects did not make it in entirety into the final publication. Rather than upload every single sub project, regardless of whether all the material contained was relevent, we decided instead to summarize the manuscript in a single, organized notebook. Any additional information required can be requested by contacting me at duncan@baumannlab.org.

Imports

In [1]:
from __future__ import print_function
from __future__ import division
%matplotlib inline
%load_ext ipycache
import matplotlib 
import matplotlib.pylab as plt
import matplotlib.ticker as ticker
from matplotlib.ticker import ScalarFormatter
from matplotlib.patches import Rectangle
import matplotlib.gridspec as gridspec
from matplotlib.collections import BrokenBarHCollection
import matplotlib.patches as mpatches
from shapely.geometry.polygon import Polygon
from descartes import PolygonPatch
import os
import sys
from IPython.display import Markdown, display
import pandas as pd
from pandas.tools.plotting import scatter_matrix
import seaborn as sns
import scipy.stats as stats
import itertools
from itertools import izip
import multiprocessing as mp
import numpy as np
import glob
from pprint import pprint
from IPython.display import Image
from IPython.display import display, HTML
from collections import defaultdict
from IPython.display import set_matplotlib_formats
from IPython.display import Image
from matplotlib.ticker import FuncFormatter
import matplotlib
import math
plt.style.use('seaborn-whitegrid')
plt.rcParams["font.family"] = "sans-serif"
plt.rcParams["font.sans-serif"] = "Arial"
matplotlib.rcParams['pdf.fonttype'] = 42
matplotlib.rcParams['ps.fonttype'] = 42
pd.set_option('display.max_colwidth', -1)
display(HTML("<style>.container { width:80% !important; }</style>"))
sys.path = ['../bin'] + sys.path
import gff3_plotting as gff3
import fasta_classes as fa
import blizzard2 as bliz
/home/dut/anaconda3/envs/anaconda2/lib/python2.7/site-packages/IPython/config.py:13: ShimWarning: The `IPython.config` package has been deprecated since IPython 4.0. You should import from traitlets.config instead.
  "You should import from traitlets.config instead.", ShimWarning)
/home/dut/anaconda3/envs/anaconda2/lib/python2.7/site-packages/IPython/utils/traitlets.py:5: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.
  warn("IPython.utils.traitlets has moved to a top-level traitlets package.")

Naming Colors and Constants

Here I establish some colors and dictionaries to convert sample names back and forth.

In [220]:
colors = ['#543005','#8c510a','#bf812d','#dfc27d','#f6e8c3','#c7eae5','#80cdc1','#35978f','#01665e','#003c30']
animal_names  = [
    '003',
    '001',
    '122',
    '4278',
    '9721',
    '6993',
    '9177',
    '8450',
    '12512',
    '12513',
]

animal_ids = [
    'Atig003',
    'Atig001',
    'Atig_122',
    'Atig_4278',
    'A.tig_9721',
    'Atig_6993',
    'Atig_9177',
    'A_tigris8450',
    'A.tig_12512',
    'A.tig_12513',
]


og_animal_ids = [
     'Atig003',
    'Atig001',
    'Atig_122',
    'A_tigris8450',
]

family_1 = [
   'Atig_122',
   'A_tigris8450',
]

family_2 = [
    'A.tig_9721',
    'A.tig_12512',
    'A.tig_12513',  
]

family_3 = [
    'Atig_4278',
    'Atig_6993',
    'Atig_9177',
]
parth_animals = [
    'A_tigris8450',
    'A.tig_12512',
    'A.tig_12513',
    'Atig_6993',
    'Atig_9177',
]

#1b9e77
#d95f02
#7570b3
#e7298a
family_colors = {
    'Atig001':'#1b9e77',
    'Atig003':'#1b9e77',
    
    'Atig_122'    :'#d95f02',
    'A_tigris8450':'#7570b3',
    
    'A.tig_9721' :'#d95f02',
    'A.tig_12512':'#7570b3',
    'A.tig_12513':'#e7298a',  
    
    'Atig_4278':'#d95f02',
    'Atig_6993':'#7570b3',
    'Atig_9177':'#e7298a',
}


color_names = dict(zip(animal_names,colors))
color_ids = dict(zip(animal_ids, colors))
id_to_name = dict(zip(animal_ids, animal_names))
name_to_id = dict(zip(animal_names, animal_ids))
majorFontSize = 10
minorFontSize = 9
sns.set(font_scale=1.0, style='whitegrid')
In [3]:
def change_name(name, map_dict=id_to_name):
    name = map_dict[name]
    return name

Accesory Functions

Here are a few accessory functions that I use later in the analysis.

In [4]:
def col_name_for_max(row, cols_to_check):
    if row[cols_to_check].max() > 0:
        max_col = row[cols_to_check].argmax()
    else:
        max_col = 'Unclassified'

    return max_col
In [5]:
def apply_df(df, func, *args):
    return df.apply(lambda x: func(x, *args), axis=1)
In [6]:
def apply_df(df, func, *args):
    return df.apply(lambda x: func(x, *args), axis=1)

def apply_by_multiprocessing(df, func, workers, *args):
    pool = mp.Pool(processes=workers)
    result = [pool.apply_async(apply_df, args = (d, func) + args) for d in np.array_split(df, workers)]
    output = [p.get() for p in result]
    pool.close()
    return pd.concat(output)
In [7]:
def round_up_to_even(f):
    return math.ceil(f / 2.) * 2

Initial Microsatellite Genotyping of 8450 and 122

This figure was generated in excel prior to me joining the lab. I have included it here both to show the data and transfer the image for the manuscript upload.

In [8]:
%%bash
cp ../fig/old\ 8450\ and\ 122\ genotypes.pdf ../fig2/Figure2A.pdf
In [9]:
Image('../fig/old 8450 and 122 genotypes.png', height=600, width=600)
Out[9]:

Population Microsatellite Analysis and Figures

The following python class can be used to bin and manipulate microsatellite genotyping data and calculate internal relatedness. The data must be formatted in a specific tsv format.

In [ ]:
# %load ../bin/blizzard2.py
#!/usr/bin/env python
# Author: Duncan Tormey
# Email: dut@stowers.org or duncantormey@gmail.com

from __future__ import print_function
from __future__ import division
from collections import Counter
import pandas as pd


class MicrosatellitePopulation(object):
    """
    Class for calculating iternal relatedness and homozygosity by loci for a set of individuals
    micosatellite genotypes. Also provides a a way to filter data prior to analysis.
    """

    def __init__(self, data_file_path, split_id=True):

        self.data_file_path = data_file_path
        self.genotypes_df = pd.read_excel(self.data_file_path)
        if split_id:
            self.genotypes_df.columns = ['sample_id', 'size_1', 'size_2', 'height_1', 'height_2']
            self.genotypes_df['sample_name'] = self.genotypes_df['sample_id'].apply(lambda x: x.split('-')[0])
            self.genotypes_df['micro_sat'] = self.genotypes_df['sample_id'].apply(lambda x: x.split('-')[1])
        else:
            self.genotypes_df.columns = ['sample_name', 'micro_sat', 'size_1', 'size_2', 'height_1', 'height_2']

        self.genotypes_df.size_2.fillna(self.genotypes_df.size_1, inplace=True)
        self.genotypes_df.drop_duplicates(inplace=True)
        self.population_sizes = {}
        self.sat_bins = {}
        self.population_intervals = {}
        self.binned_population_sizes = {}
        self.population_size_frequencies = {}


    def remove_microsattelite(self, micro_sat):
        """Removes a specific microsatellite from all individuals in population data"""
        self.genotypes_df = self.genotypes_df[self.genotypes_df.micro_sat != micro_sat]

    def remove_sample(self, sample_name):
        """Removes a individual based on sample name from the population"""
        self.genotypes_df = self.genotypes_df[self.genotypes_df.sample_name != sample_name]
        self.genotypes_df = self.genotypes_df[self.genotypes_df.sample_name != str(sample_name)]

    def get_population_sizes(self):
        """Populates the population sizes dictionary with all alleles across all animals"""
        for sat in self.genotypes_df.micro_sat.unique():
            size_dist = \
                self.genotypes_df[self.genotypes_df.micro_sat == sat].filter(regex='size').stack().reset_index()[
                    0].tolist()
            size_dist = [round(size) for size in size_dist]
            self.population_sizes[sat] = size_dist

    def get_sat_bins(self):
        """Determines the number of bins for each microsatellite, based on 3 nucleotide windows"""
        for sat in self.population_sizes:
            bins = round(float(max(self.population_sizes[sat]) -
                               min(self.population_sizes[sat])) / 3.0)
            if bins <= 0:
                bins = 1
            self.sat_bins[sat] = bins

    def get_population_intervals(self):
        """Determines the intervals for the number of bins for each microsatellite based on
        self.get_sat_bins"""
        for sat in self.population_sizes:
            out, bins = pd.cut(self.population_sizes[sat],
                               bins=self.sat_bins[sat],
                               retbins=True)

            intervals = [
                tuple(
                    float(i)
                    for i in str(o).translate(None, "()[]").split(', ')
                )
                for o in set(out)
                ]
            self.population_intervals[sat] = intervals

    def in_interval(self, size, intervals):
        """This is method that takes a allele and list of intervals and returns the interval in
        which the allele resides"""
        matched = None
        for interval in intervals:
            if interval[0] < size <= interval[1]:
                matched = interval[1]
                break
        return matched

    def get_binned_population_sizes(self):
        """Assigns each of the alleles in population_sizes to an interval"""
        for sat in self.population_sizes:
            self.binned_population_sizes[sat] = [self.in_interval(size, self.population_intervals[sat])
                                                 for size in self.population_sizes[sat]]

    def get_population_size_frequencies(self):
        """Determines the frequency of each allele in binned_population_sizes."""
        self.population_size_frequencies = {
            key: {
                k: float(v) / float(len(val))
                for k, v in Counter(val).items()
                }
            for key, val in self.binned_population_sizes.items()
            }

    def bin_data(self):
        """Bins thea data for each sample"""
        self.get_population_sizes()
        self.get_sat_bins()
        self.get_population_intervals()
        self.get_binned_population_sizes()
        self.get_population_size_frequencies()
        self.genotypes_df['binned_size_1'] = self.genotypes_df.apply(lambda x:
                                                                     self.in_interval(round(x['size_1']),
                                                                                      self.population_intervals[
                                                                                          x['micro_sat']]),
                                                                     axis=1)
        self.genotypes_df['binned_size_2'] = self.genotypes_df.apply(lambda x:
                                                                     self.in_interval(round(x['size_2']),
                                                                                      self.population_intervals[
                                                                                          x['micro_sat']]),
                                                                     axis=1)

    def calc_internal_relatedness(self):
        """Calculates internal relatedness and homozygosity by loci for each sample"""
        self.bin_data()
        self.ir_df = []
        for sample_name in self.genotypes_df.sample_name.unique():
            l_df = self.genotypes_df[self.genotypes_df.sample_name == sample_name]
            e_homo = 0
            e_hetero = 0
            num_hom_loci = 0
            num_loci = 0
            l_df.fillna(0.0, inplace=True)
            for row in l_df.itertuples():
                if row.binned_size_1 != 0.0 and row.binned_size_2 != 0.0:
                    num_loci += 1
                    if row.binned_size_1 == row.binned_size_2:
                        num_hom_loci += 1
                        e_homo += self.population_size_frequencies[row.micro_sat][row.binned_size_1]
                        e_homo += self.population_size_frequencies[row.micro_sat][row.binned_size_2]
                    else:
                        e_hetero += self.population_size_frequencies[row.micro_sat][row.binned_size_1]
                        e_hetero += self.population_size_frequencies[row.micro_sat][row.binned_size_2]

            score = e_homo / (e_hetero + e_homo)
            ir = (2 * num_hom_loci - (e_homo + e_hetero)) / (2 * num_loci - (e_homo + e_hetero))
            self.ir_df.append({'sample_name': sample_name, 'homozygosity_by_loci': score,
                               'internal_relatedness': ir, 'num_hom_loci': num_hom_loci,
                               'total_loci': num_loci})
        self.ir_df = pd.DataFrame(self.ir_df)
        self.ir_df = self.ir_df[['sample_name', 'homozygosity_by_loci',
                                 'internal_relatedness', 'num_hom_loci', 'total_loci']]
        self.ir_df = self.ir_df.sort_values('internal_relatedness', ascending=False)
        self.ir_df.reset_index(inplace=True, drop=True)


if __name__ == '__main__':
    print('')

First, I loaded in the microsatellite data that was provided to me by Peter Baumann. This file contains the genotyping data for the more up to date panel of microsatellites. I restructure the data to work better with the python class I wrote.

In [10]:
populationData = pd.read_excel('../data/marmorata_mirco_test_data.xlsx')
populationData.head()
Out[10]:
Sample ID Size 1 Size 2 Height 1 Height 2
0 9387-A105-LizardMS NaN NaN NaN NaN
1 9387-Ai5013-LizardMS 227.66 253.09 7229.0 6420.0
2 9387-Ai5043-LizardMS 175.87 NaN 824.0 NaN
3 9387-Cvanu24-LizardMS 200.09 NaN 6737.0 NaN
4 9387-Cvanu7-LizardMS NaN NaN NaN NaN
In [11]:
marmPopulation = bliz.MicrosatellitePopulation('../data/marmorata_mirco_test_data.xlsx', split_id=True)
In [12]:
len(marmPopulation.genotypes_df.micro_sat.unique())
Out[12]:
12
In [13]:
for x in marmPopulation.genotypes_df.micro_sat.unique():
    print(x)
A105
Ai5013
Ai5043
Cvanu24
Cvanu7
D106
D107
D111
MS1
MS6
MS7
MS8
In [14]:
for x in marmPopulation.genotypes_df.sample_name.unique():
    print(x)
9387
9525
95261
9526
9721
9722
115841
12512
125131
137461
137471
16996
16997
17033
17034
17049
17050
17051
17330
176291
17630
176311
19687
116001
11784
14796
16998
17114
184
185
187
195
234
261
267
268
8033
8873
171151
171161
171171
17118
122_tig
8450_tig
In [15]:
marmPopulation.genotypes_df.head()
Out[15]:
sample_id size_1 size_2 height_1 height_2 sample_name micro_sat
0 9387-A105-LizardMS NaN NaN NaN NaN 9387 A105
1 9387-Ai5013-LizardMS 227.66 253.09 7229.0 6420.0 9387 Ai5013
2 9387-Ai5043-LizardMS 175.87 175.87 824.0 NaN 9387 Ai5043
3 9387-Cvanu24-LizardMS 200.09 200.09 6737.0 NaN 9387 Cvanu24
4 9387-Cvanu7-LizardMS NaN NaN NaN NaN 9387 Cvanu7
In [16]:
marmPopulation.genotypes_df.groupby('sample_name').count().sort_values('size_1').head()
Out[16]:
sample_id size_1 size_2 height_1 height_2 micro_sat
sample_name
9526 1 1 1 1 1 1
9387 12 7 7 7 4 12
95261 11 8 8 8 2 11
9525 12 10 10 10 5 12
122_tig 12 11 11 11 5 12
In [17]:
marmPopulation.genotypes_df[marmPopulation.genotypes_df.sample_name == "9387"]
Out[17]:
sample_id size_1 size_2 height_1 height_2 sample_name micro_sat
0 9387-A105-LizardMS NaN NaN NaN NaN 9387 A105
1 9387-Ai5013-LizardMS 227.66 253.09 7229.0 6420.0 9387 Ai5013
2 9387-Ai5043-LizardMS 175.87 175.87 824.0 NaN 9387 Ai5043
3 9387-Cvanu24-LizardMS 200.09 200.09 6737.0 NaN 9387 Cvanu24
4 9387-Cvanu7-LizardMS NaN NaN NaN NaN 9387 Cvanu7
5 9387-D106-LizardMS NaN NaN NaN NaN 9387 D106
6 9387-D107-LizardMS NaN NaN NaN NaN 9387 D107
7 9387-D111-LizardMS NaN NaN NaN NaN 9387 D111
8 9387-MS1-LizardMS 216.81 216.81 4940.0 NaN 9387 MS1
9 9387-MS6-LizardMS 157.99 174.19 2969.0 3342.0 9387 MS6
10 9387-MS7-LizardMS 235.80 259.11 7798.0 5524.0 9387 MS7
11 9387-MS8-LizardMS 109.11 113.24 5579.0 6037.0 9387 MS8
In [18]:
marmPopulation.genotypes_df[marmPopulation.genotypes_df.sample_name == "9526"]
Out[18]:
sample_id size_1 size_2 height_1 height_2 sample_name micro_sat
28 9526-Cvanu7-LizardMS 332.55 344.0 1058.0 1000.0 9526 Cvanu7
In [19]:
marmPopulation.genotypes_df[marmPopulation.genotypes_df.sample_name == "95261"]
Out[19]:
sample_id size_1 size_2 height_1 height_2 sample_name micro_sat
24 95261-A105-LizardMS NaN NaN NaN NaN 95261 A105
25 95261-Ai5013-LizardMS 220.42 253.09 4216.0 3469.0 95261 Ai5013
26 95261-Ai5043-LizardMS 175.84 175.84 10957.0 NaN 95261 Ai5043
27 95261-Cvanu24-LizardMS 200.00 200.00 5789.0 NaN 95261 Cvanu24
29 95261-D106-LizardMS 291.12 299.20 2947.0 3348.0 95261 D106
30 95261-D107-LizardMS 127.82 127.82 2117.0 NaN 95261 D107
31 95261-D111-LizardMS NaN NaN NaN NaN 95261 D111
32 95261-MS1-LizardMS 216.78 216.78 11264.0 NaN 95261 MS1
33 95261-MS6-LizardMS 174.15 174.15 4377.0 NaN 95261 MS6
34 95261-MS7-LizardMS NaN NaN NaN NaN 95261 MS7
35 95261-MS8-LizardMS 109.17 109.17 3661.0 NaN 95261 MS8
In [20]:
marmPopulation.remove_sample('9387')
marmPopulation.remove_sample('9526')
In [21]:
for x in marmPopulation.genotypes_df.sample_name.unique():
    print(x, len(x))
9525 4
95261 5
9721 4
9722 4
115841 6
12512 5
125131 6
137461 6
137471 6
16996 5
16997 5
17033 5
17034 5
17049 5
17050 5
17051 5
17330 5
176291 6
17630 5
176311 6
19687 5
116001 6
11784 5
14796 5
16998 5
17114 5
184 3
185 3
187 3
195 3
234 3
261 3
267 3
268 3
8033 4
8873 4
171151 6
171161 6
171171 6
17118 5
122_tig 7
8450_tig 8
In [22]:
for x in marmPopulation.genotypes_df.micro_sat.unique():
    print(x)
A105
Ai5013
Ai5043
Cvanu24
Cvanu7
D106
D107
D111
MS1
MS6
MS7
MS8
In [23]:
marmPopulation.genotypes_df[marmPopulation.genotypes_df.sample_name=='8450_tig']
Out[23]:
sample_id size_1 size_2 height_1 height_2 sample_name micro_sat
519 8450_tig-A105-LizardMS NaN NaN NaN NaN 8450_tig A105
520 8450_tig-Ai5013-LizardMS 257.14 257.14 32277.0 NaN 8450_tig Ai5013
521 8450_tig-Ai5043-LizardMS 178.81 178.81 7372.0 NaN 8450_tig Ai5043
522 8450_tig-Cvanu24-LizardMS 199.92 199.92 32614.0 NaN 8450_tig Cvanu24
523 8450_tig-Cvanu7-LizardMS 328.32 328.32 13329.0 NaN 8450_tig Cvanu7
524 8450_tig-D106-LizardMS 299.26 299.26 18096.0 NaN 8450_tig D106
525 8450_tig-D107-LizardMS 178.88 178.88 114.0 NaN 8450_tig D107
526 8450_tig-D111-LizardMS 150.00 150.00 10241.0 NaN 8450_tig D111
527 8450_tig-MS1-LizardMS 216.99 216.99 28176.0 NaN 8450_tig MS1
528 8450_tig-MS6-LizardMS 174.18 174.18 32639.0 NaN 8450_tig MS6
529 8450_tig-MS7-LizardMS 243.79 243.79 14502.0 NaN 8450_tig MS7
530 8450_tig-MS8-LizardMS 114.31 114.31 25818.0 NaN 8450_tig MS8
In [24]:
marmPopulation.genotypes_df[marmPopulation.genotypes_df.sample_name=='122_tig']
Out[24]:
sample_id size_1 size_2 height_1 height_2 sample_name micro_sat
507 122_tig-A105-LizardMS NaN NaN NaN NaN 122_tig A105
508 122_tig-Ai5013-LizardMS 213.73 257.15 31746.0 24594.0 122_tig Ai5013
509 122_tig-Ai5043-LizardMS 181.65 181.65 11064.0 NaN 122_tig Ai5043
510 122_tig-Cvanu24-LizardMS 200.00 200.00 32659.0 NaN 122_tig Cvanu24
511 122_tig-Cvanu7-LizardMS 328.41 351.65 9057.0 4693.0 122_tig Cvanu7
512 122_tig-D106-LizardMS 295.40 299.43 12924.0 11387.0 122_tig D106
513 122_tig-D107-LizardMS 127.91 127.91 665.0 NaN 122_tig D107
514 122_tig-D111-LizardMS 150.00 150.00 10981.0 NaN 122_tig D111
515 122_tig-MS1-LizardMS 217.04 217.04 30346.0 NaN 122_tig MS1
516 122_tig-MS6-LizardMS 174.27 174.27 32567.0 NaN 122_tig MS6
517 122_tig-MS7-LizardMS 243.81 282.56 16395.0 14654.0 122_tig MS7
518 122_tig-MS8-LizardMS 113.39 114.43 30780.0 25017.0 122_tig MS8
In [25]:
marmPopulation.genotypes_df[marmPopulation.genotypes_df.sample_name=='19687']
Out[25]:
sample_id size_1 size_2 height_1 height_2 sample_name micro_sat
255 19687-A105-LizardMS 255.79 255.79 37.0 NaN 19687 A105
256 19687-Ai5013-LizardMS 220.28 220.28 6972.0 NaN 19687 Ai5013
257 19687-Ai5043-LizardMS 175.83 175.83 1236.0 NaN 19687 Ai5043
258 19687-Cvanu24-LizardMS 199.91 199.91 7414.0 NaN 19687 Cvanu24
259 19687-Cvanu7-LizardMS 331.56 331.56 1350.0 NaN 19687 Cvanu7
260 19687-D106-LizardMS 290.91 290.91 2567.0 NaN 19687 D106
261 19687-D107-LizardMS 170.47 170.47 171.0 NaN 19687 D107
262 19687-D111-LizardMS 150.10 150.10 1522.0 NaN 19687 D111
263 19687-MS1-LizardMS 216.86 216.86 8039.0 NaN 19687 MS1
264 19687-MS6-LizardMS 174.16 174.16 6526.0 NaN 19687 MS6
265 19687-MS7-LizardMS 290.12 290.12 7152.0 NaN 19687 MS7
266 19687-MS8-LizardMS 109.03 109.03 6681.0 NaN 19687 MS8
In [26]:
marmPopulation.genotypes_df[marmPopulation.genotypes_df.sample_name=='17330']
Out[26]:
sample_id size_1 size_2 height_1 height_2 sample_name micro_sat
204 17330-A105-LizardMS 255.89 255.89 496.0 NaN 17330 A105
205 17330-Ai5013-LizardMS 220.28 220.28 31495.0 NaN 17330 Ai5013
206 17330-Ai5043-LizardMS 175.95 175.95 16340.0 NaN 17330 Ai5043
207 17330-Cvanu24-LizardMS 199.91 199.91 32360.0 NaN 17330 Cvanu24
208 17330-Cvanu7-LizardMS 331.64 331.64 9399.0 NaN 17330 Cvanu7
209 17330-D106-LizardMS 294.86 294.86 17247.0 NaN 17330 D106
210 17330-D107-LizardMS 178.59 268.36 612.0 271.0 17330 D107
211 17330-D111-LizardMS 150.19 150.19 5223.0 NaN 17330 D111
212 17330-MS1-LizardMS 216.86 216.86 28794.0 NaN 17330 MS1
213 17330-MS6-LizardMS 174.12 174.12 32581.0 NaN 17330 MS6
214 17330-MS7-LizardMS 259.08 259.08 19101.0 NaN 17330 MS7
215 17330-MS8-LizardMS 109.14 109.14 32241.0 NaN 17330 MS8
In [27]:
listsOfAnimalsAndMSUsed = pd.DataFrame({'list of microsatellite markers used':pd.Series(marmPopulation.genotypes_df.micro_sat.unique()),'list of animals used':pd.Series(marmPopulation.genotypes_df.sample_name.unique())})
listsOfAnimalsAndMSUsed.to_excel('../data/listsOfAnimalsAndMSUsed.xlsx')
In [28]:
marmPopulation.calc_internal_relatedness()
/home/dut/anaconda3/envs/anaconda2/lib/python2.7/site-packages/pandas/core/frame.py:2852: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  downcast=downcast, **kwargs)
In [29]:
marmPopulation.ir_df.head()
Out[29]:
sample_name homozygosity_by_loci internal_relatedness num_hom_loci total_loci
0 8450_tig 1.000000 1.000000 11 11
1 19687 1.000000 1.000000 12 12
2 17330 0.989654 0.800759 11 12
3 125131 0.897412 0.614075 10 12
4 12512 0.898693 0.506762 9 12
In [30]:
marm_ir = marmPopulation.ir_df
sns.set(style='whitegrid')
plt.figure(figsize=(6.4,1.2))
ax = sns.distplot(marm_ir['internal_relatedness'], kde=False, rug=False, bins=30, color='darkred', hist_kws={'alpha':1.0,'edgecolor':"k", 'linewidth':1})
ax.set_ylim((0,8.5))
ax.set_ylabel('Number of Animals', fontsize=minorFontSize)
ax.set_xlabel('Internal Relatedness', fontsize=minorFontSize)
#ax.set_title('Internal Relatedness of 44 Aspidoscelis marmorata')
ax.arrow(0.978,2.8,0.0,-0.5, width=0.01, head_width=0.035, head_length=0.25, color='#3498db')
ax.arrow(0.335 ,2.8,0.0,-0.5, width=0.01, head_width=0.035, head_length=0.25, color="#9b59b6")
ax.set_xlim(-0.5,1.1)
ax.set_xticklabels(ax.get_xticks(), fontsize=minorFontSize)
ax.set_yticklabels(ax.get_yticks(), fontsize=minorFontSize)
ax.text(-0.15, 1.1, 'B', transform=ax.transAxes,
      fontsize=minorFontSize, fontweight='bold', va='top', ha='right')

fig = ax.get_figure()
fig.savefig('../fig2/Figure2B.pdf', bbox_inches='tight')
In [31]:
marmPopulation.genotypes_df[['sample_name', 'micro_sat','size_1', 'size_2', 'height_1', 'height_2', 'binned_size_1', 'binned_size_2']].head()
Out[31]:
sample_name micro_sat size_1 size_2 height_1 height_2 binned_size_1 binned_size_2
12 9525 A105 NaN NaN NaN NaN NaN NaN
13 9525 Ai5013 220.34 252.96 2195.0 1931.0 220.333 253.944
14 9525 Ai5043 175.71 175.71 7536.0 NaN 179.000 179.000
15 9525 Cvanu24 200.00 200.00 4915.0 NaN 200.200 200.200
16 9525 Cvanu7 332.59 352.53 651.0 392.0 334.182 355.818
In [32]:
supplementalTable1 = marmPopulation.genotypes_df[['sample_name', 'micro_sat','size_1', 'size_2', 'height_1', 'height_2', 'binned_size_1', 'binned_size_2']]
supplementalTable1.to_csv('../data/supplemental_data/supplemental_table_1.csv',index=False)

Here I calculate the size ranges for each of the microsatellites in the population.

In [33]:
stack1 = marmPopulation.genotypes_df[['micro_sat', 'size_1']].copy()
stack2 = marmPopulation.genotypes_df[['micro_sat', 'size_2']].copy()
stack1.columns = ['micro_sat', 'size']
stack2.columns = ['micro_sat', 'size']
allSizesDf = pd.concat([stack1, stack2])
allSizesDf.head()
Out[33]:
micro_sat size
12 A105 NaN
13 Ai5013 220.34
14 Ai5043 175.71
15 Cvanu24 200.00
16 Cvanu7 332.59

I use the group by method and describe to generate the following table.

In [34]:
microSatSizeRangesDf = allSizesDf.groupby('micro_sat').describe().reset_index().reset_index()
microSatSizeRangesDf
Out[34]:
index micro_sat size
count mean std min 25% 50% 75% max
0 0 A105 74.0 246.969189 18.979147 197.16 255.7150 255.835 255.8975 256.03
1 1 Ai5013 84.0 231.230714 16.885507 201.63 220.2225 227.445 250.9125 257.15
2 2 Ai5043 84.0 176.634762 1.591840 175.50 175.7700 175.840 176.0700 181.65
3 3 Cvanu24 84.0 199.933571 0.097877 199.73 199.9100 199.910 200.0000 200.17
4 4 Cvanu7 82.0 337.467683 9.083376 328.20 331.5650 331.670 344.2500 361.60
5 5 D106 84.0 300.888214 21.955367 274.69 294.8100 298.900 299.3800 484.02
6 6 D107 90.0 158.812222 36.975351 89.65 127.7225 166.365 174.5175 268.36
7 7 D111 80.0 152.250875 6.991427 149.90 150.0000 150.100 150.2000 207.90
8 8 MS1 84.0 217.545357 3.686983 216.62 216.7600 216.850 216.9300 239.14
9 9 MS6 84.0 171.701786 5.910574 157.80 174.0200 174.130 174.2400 176.26
10 10 MS7 82.0 268.911098 16.973769 235.67 259.0825 266.785 289.0675 298.10
11 11 MS8 84.0 111.705119 2.947997 108.80 109.1100 109.285 113.3950 119.71

I then subset to display only min and max values. I then manually copied this into the excel document that Rob sent me.

In [35]:
df = microSatSizeRangesDf['size'][[ 'min', 'max']]
df['micro_sat'] = microSatSizeRangesDf['micro_sat']
df
Out[35]:
min max micro_sat
0 197.16 256.03 A105
1 201.63 257.15 Ai5013
2 175.50 181.65 Ai5043
3 199.73 200.17 Cvanu24
4 328.20 361.60 Cvanu7
5 274.69 484.02 D106
6 89.65 268.36 D107
7 149.90 207.90 D111
8 216.62 239.14 MS1
9 157.80 176.26 MS6
10 235.67 298.10 MS7
11 108.80 119.71 MS8

He I read in the final table, and rewrite it to a file named 'supplemental table 2.

In [36]:
supplementalTable2 = pd.read_excel('../data/Microsatellite_primer_pairs_used_for_genotyping.xlsx')
supplementalTable2
Out[36]:
Microsatellite Oligo 1 sequence (5’ → 3’) Oligo 2 sequence (5’ → 3’) Min Size Max Size
0 A105 AATCCTGAACCTACGGTAAGC TGCCAGAAAATAGAGGGAAG 197.16 256.03
1 Ai5013 AATTAATGTGCAGCACTAT GGCAGTTTTTCAGCTAAG 201.63 257.15
2 Ai5043 AAAAAGAAAAGGAAGAACTAA TGAGACAAGTTGGGTAGA 175.50 181.65
3 Cvanu24 TTTAATGCATCCACTGAGTC GGAATATAGTGGCATATCAG 199.73 200.17
4 Cvanu7 GACCAATAATGTGGAAGCTG ACATGGCTGAGTAATTGGTG 328.20 361.60
5 D106 TTAAAGCAGAGGTCAGGTTATC GATGGAAGAATAGGATGATGAA 274.69 484.02
6 D107 TACCCACCTGGAGATGTTTAG AGGACGCCTTAAAATAGGAAG 89.65 268.36
7 D111 TGGAGGCAGTCTTGGTATC GAACATTGACCGCATCAC 149.90 207.90
8 MS1 TGCATGATGGAGGAATCTTC CTAGTGGTGATAGAAACATGG 216.62 239.14
9 MS6 CACACCCATATTATAAGTGG CATTCAGATGAAACCTAACC 157.80 176.26
10 MS7 AACTAAGTGCTAAGTGTGAC ACAGTCTTAGAGATCACAAG 235.67 298.10
11 MS8 ACACCCAAAGTCCTCAACAG CTAGTACATGTGTAAGGGTG 108.80 119.71
In [37]:
supplementalTable2.to_excel('../data/supplemental_data/supplemental_table_2.xls',index=False)

Genomic Sequencing Data

Here I read in the table that describes all of the sequencing data. This is referenced when we first describe the genome

In [38]:
supplementalTable3 = pd.read_csv('../../ncbi_genome_delivery/data/supplemental_sequencing_data.csv', sep=',')
supplementalTable3
Out[38]:
library id sample name tissue sex Instrument Model design description Purpose order type selection read length total reads or read pairs
0 L11871 A_tigris8450 liver Female Illumina HiSeq 2500 5kb Mate Pair DNA Sequencing of Aspidoscelis marmoratus: A_tigris8450 Mate Pair (1ug or 4ug) RANDOM 150 68124235.0
1 L11871-1 A_tigris8450 liver Female Illumina HiSeq 2500 8kb Mate Pair DNA Sequencing of Aspidoscelis marmoratus: A_tigris8450 Mate Pair (1ug or 4ug) RANDOM 150 62138640.0
2 L11871-2 A_tigris8450 liver Female Illumina HiSeq 2500 2-15kb Mate Pair DNA Sequencing of Aspidoscelis marmoratus: A_tigris8450 Mate Pair (1ug or 4ug) RANDOM 150 67949265.0
3 L13136-2 A_tigris8450 liver Female Illumina HiSeq 2500 Paired End DNA Sequencing of Aspidoscelis marmoratus: A_tigris8450 DNA-Seq (1ug) RANDOM 250 376374707.0
4 L15771 A_tigris8450 liver Female Illumina HiSeq 2500 CHICAGO DNA Sequencing of Aspidoscelis marmoratus: A_tigris8450 CHICAGO RANDOM 100 207139607.0
5 L15772 A_tigris8450 liver Female Illumina HiSeq 2500 CHICAGO DNA Sequencing of Aspidoscelis marmoratus: A_tigris8450 CHICAGO RANDOM 100 52923608.0
6 S11870 A_tigris8450 liver Female Illumina MiSeq 40kb Mate Pair DNA Sequencing of Aspidoscelis marmoratus: A_tigris8450 Mate Pair RANDOM 184 7325966.0
7 S11870 A_tigris8450 liver Female Illumina MiSeq 40kb Mate Pair DNA Sequencing of Aspidoscelis marmoratus: A_tigris8450 Mate Pair RANDOM 250 7578267.0
8 L13087 Atig001 liver not collected Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (1ug) RANDOM 150 96536963.0
9 L13088 Atig003 liver not collect Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (1ug) RANDOM 150 71079632.0
10 L13088-1 Atig003 liver not collect Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (1ug) RANDOM 150 37328268.0
11 L13136 A_tigris8450 liver Female Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (1ug) RANDOM 150 57367890.0
12 L13136-1 A_tigris8450 liver Female Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (1ug) RANDOM 150 46483161.0
13 L21676 Atig_122 tail Female Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (100ng) RANDOM 150 321903175.0
14 L30698 Atig_9177 tail Female Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (1ug) RANDOM 150 113372074.0
15 L30699 Atig_6993 tail Female Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (1ug) RANDOM 150 100116690.0
16 L30700 Atig_4278 tail Female Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (1ug) RANDOM 150 106787117.0
17 L30701 A.tig_12512 tail Female Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (1ug) RANDOM 150 102760203.0
18 L30702 A.tig_12513 tail Female Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (1ug) RANDOM 150 103885909.0
19 L30703 A.tig_9721 tail Female Illumina HiSeq 2500 Paired End DNAseq for Aspidoscelis marmoratus Variant Calling DNA-Seq (1ug) RANDOM 150 101265101.0
20 L11223 9601 blood Female Illumina HiSeq 2500 Paired End RNAseq for Aspidoscelis marmoratus Genome Annotation poly-A Stranded RNA-Seq PolyA Selection 100 232139414.0
21 L11224 11225 blood Male Illumina HiSeq 2500 Paired End RNAseq for Aspidoscelis marmoratus Genome Annotation poly-A Stranded RNA-Seq PolyA Selection 100 260779309.0
22 L11225 11677 blood Male Illumina HiSeq 2500 Paired End RNAseq for Aspidoscelis marmoratus Genome Annotation poly-A Stranded RNA-Seq PolyA Selection 100 189619876.0
23 L16900 S237-2 embryo not collected Illumina HiSeq 2500 Paired End RNAseq for Aspidoscelis marmoratus Genome Annotation poly-A Stranded RNA-Seq PolyA Selection 100 253832665.0
In [39]:
supplementalTable3.to_csv('../data/supplemental_data/supplemental_table_3.csv')

Genome Size Prediction with FACs

Next I need to copy over the excel document that contains the analysis for genome size prediction. I had to manually copy the file over directly from Google drive to the N drive. Here I rename it as supplemental table 5.

In [40]:
%%bash
cp ../data/09cy569_acb-tableMarch2017.xlsx ../data/supplemental_data/supplemental_table_4.xlsx

All Scaffolds Genome Stats

The following python module is used to manipulate and gather data from a fasta formated sequence file.

In [41]:
# %load ../bin/fasta_classes.py
In [42]:
dovetailGenomeFile2 = '../../dovetail_genome_delivery/data/fastas/lizard_23Jun2015_piz6a.upper.fasta'
dtGenome2 = fa.Fasta_file(dovetailGenomeFile2)
dtGenome2.get_bc_stats()
perc_genome_covered_by_scaffolds_greater_than_1mb 98.4720241117
total bases 1639530780
perc_genome_covered_by_scaffolds_greater_than_100kb 99.4578671466
scaffolds_greater_than_100Mb 0
N90 7979424
scaffolds_greater_than_1Mb 90
median_scaffold 1450
perc_genome_covered_by_scaffolds_greater_than_10kb 99.6264638594
N50 32220929
max_scaffold 85027298
scaffolds_greater_than_10kb 223
mean_scaffold 428523.465761
min_scaffold 927
scaffolds_greater_than_10Mb 45
scaffolds_greater_than_100kb 133
number of scaffolds 3826
In [43]:
statDF2 = pd.DataFrame(dtGenome2.stats,index=['statistic'])
statDF2
Out[43]:
N50 N90 max_scaffold mean_scaffold median_scaffold min_scaffold number of scaffolds perc_A perc_C perc_G ... perc_T perc_genome_covered_by_scaffolds_greater_than_100kb perc_genome_covered_by_scaffolds_greater_than_10kb perc_genome_covered_by_scaffolds_greater_than_1mb scaffolds_greater_than_100Mb scaffolds_greater_than_100kb scaffolds_greater_than_10Mb scaffolds_greater_than_10kb scaffolds_greater_than_1Mb total bases
statistic 32220929 7979424 85027298 428523.465761 1450 927 3826 27.039101 20.122447 20.12903 ... 27.040218 99.457867 99.626464 98.472024 0 133 45 223 90 1639530780

1 rows × 21 columns

In [44]:
statDF2['non_n_GC%'] = statDF2[['perc_G','perc_C']].T.sum() / statDF2[['perc_A','perc_G','perc_C','perc_T']].T.sum() * 100.0
statDF2['non_n_AT%'] = statDF2[['perc_A','perc_T']].T.sum() / statDF2[['perc_A','perc_G','perc_C','perc_T']].T.sum() * 100.0

statDF2 = statDF2.T
statDF2['statistic'] = statDF2['statistic'].apply(lambda x: '%.5f' % x)
statDF2
Out[44]:
statistic
N50 32220929.00000
N90 7979424.00000
max_scaffold 85027298.00000
mean_scaffold 428523.46576
median_scaffold 1450.00000
min_scaffold 927.00000
number of scaffolds 3826.00000
perc_A 27.03910
perc_C 20.12245
perc_G 20.12903
perc_N 5.66920
perc_T 27.04022
perc_genome_covered_by_scaffolds_greater_than_100kb 99.45787
perc_genome_covered_by_scaffolds_greater_than_10kb 99.62646
perc_genome_covered_by_scaffolds_greater_than_1mb 98.47202
scaffolds_greater_than_100Mb 0.00000
scaffolds_greater_than_100kb 133.00000
scaffolds_greater_than_10Mb 45.00000
scaffolds_greater_than_10kb 223.00000
scaffolds_greater_than_1Mb 90.00000
total bases 1639530780.00000
non_n_GC% 42.67056
non_n_AT% 57.32944
In [45]:
statDF2 = statDF2.reset_index()
statDF2.columns = ['statistic', 'value']
In [46]:
statDF2
Out[46]:
statistic value
0 N50 32220929.00000
1 N90 7979424.00000
2 max_scaffold 85027298.00000
3 mean_scaffold 428523.46576
4 median_scaffold 1450.00000
5 min_scaffold 927.00000
6 number of scaffolds 3826.00000
7 perc_A 27.03910
8 perc_C 20.12245
9 perc_G 20.12903
10 perc_N 5.66920
11 perc_T 27.04022
12 perc_genome_covered_by_scaffolds_greater_than_100kb 99.45787
13 perc_genome_covered_by_scaffolds_greater_than_10kb 99.62646
14 perc_genome_covered_by_scaffolds_greater_than_1mb 98.47202
15 scaffolds_greater_than_100Mb 0.00000
16 scaffolds_greater_than_100kb 133.00000
17 scaffolds_greater_than_10Mb 45.00000
18 scaffolds_greater_than_10kb 223.00000
19 scaffolds_greater_than_1Mb 90.00000
20 total bases 1639530780.00000
21 non_n_GC% 42.67056
22 non_n_AT% 57.32944
In [47]:
statDF2.to_csv('../data/supplemental_data/supplemental_table_5.csv',index=False)

Large Scaffold Genome Stats

Here I load in the fasta file containing all scaffolds in the genome greater than 10kb using the fasta_classes.py module shown above. I use this module to gather basic statistics about the genome

In [48]:
dovetailGenomeFile = '../data/gatk6/reference/tigris_scaffolds_filt_10000.fa'
dtGenome = fa.Fasta_file(dovetailGenomeFile)
dtGenome.get_bc_stats()
perc_genome_covered_by_scaffolds_greater_than_1mb 98.8412318344
total bases 1633406540
perc_genome_covered_by_scaffolds_greater_than_100kb 99.8307711563
scaffolds_greater_than_100Mb 0
N90 8340160
scaffolds_greater_than_1Mb 90
median_scaffold 296338
perc_genome_covered_by_scaffolds_greater_than_10kb 100.0
N50 32220929
max_scaffold 85027298
scaffolds_greater_than_10kb 223
mean_scaffold 7324693.00448
min_scaffold 10129
scaffolds_greater_than_10Mb 45
scaffolds_greater_than_100kb 133
number of scaffolds 223
In [49]:
statDF = pd.DataFrame(dtGenome.stats,index=['statistic'])
statDF
Out[49]:
N50 N90 max_scaffold mean_scaffold median_scaffold min_scaffold number of scaffolds perc_A perc_C perc_G ... perc_T perc_genome_covered_by_scaffolds_greater_than_100kb perc_genome_covered_by_scaffolds_greater_than_10kb perc_genome_covered_by_scaffolds_greater_than_1mb scaffolds_greater_than_100Mb scaffolds_greater_than_100kb scaffolds_greater_than_10Mb scaffolds_greater_than_10kb scaffolds_greater_than_1Mb total bases
statistic 32220929 8340160 85027298 7.324693e+06 296338 10129 223 27.049532 20.123188 20.129348 ... 27.050599 99.830771 100.0 98.841232 0 133 45 223 90 1633406540

1 rows × 21 columns

Here I show the basic statistics gathered by the fasta class

In [50]:
statDF['non_n_GC%'] = statDF[['perc_G','perc_C']].T.sum() / statDF[['perc_A','perc_G','perc_C','perc_T']].T.sum() * 100.0
statDF['non_n_AT%'] = statDF[['perc_A','perc_T']].T.sum() / statDF[['perc_A','perc_G','perc_C','perc_T']].T.sum() * 100.0

statDF = statDF.T
statDF['statistic'] = statDF['statistic'].apply(lambda x: '%.5f' % x)
statDF
Out[50]:
statistic
N50 32220929.00000
N90 8340160.00000
max_scaffold 85027298.00000
mean_scaffold 7324693.00448
median_scaffold 296338.00000
min_scaffold 10129.00000
number of scaffolds 223.00000
perc_A 27.04953
perc_C 20.12319
perc_G 20.12935
perc_N 5.64733
perc_T 27.05060
perc_genome_covered_by_scaffolds_greater_than_100kb 99.83077
perc_genome_covered_by_scaffolds_greater_than_10kb 100.00000
perc_genome_covered_by_scaffolds_greater_than_1mb 98.84123
scaffolds_greater_than_100Mb 0.00000
scaffolds_greater_than_100kb 133.00000
scaffolds_greater_than_10Mb 45.00000
scaffolds_greater_than_10kb 223.00000
scaffolds_greater_than_1Mb 90.00000
total bases 1633406540.00000
non_n_GC% 42.66179
non_n_AT% 57.33821
In [51]:
statDF = statDF.reset_index()
In [52]:
statDF.columns = ['statistic', 'value']
In [53]:
statDF
Out[53]:
statistic value
0 N50 32220929.00000
1 N90 8340160.00000
2 max_scaffold 85027298.00000
3 mean_scaffold 7324693.00448
4 median_scaffold 296338.00000
5 min_scaffold 10129.00000
6 number of scaffolds 223.00000
7 perc_A 27.04953
8 perc_C 20.12319
9 perc_G 20.12935
10 perc_N 5.64733
11 perc_T 27.05060
12 perc_genome_covered_by_scaffolds_greater_than_100kb 99.83077
13 perc_genome_covered_by_scaffolds_greater_than_10kb 100.00000
14 perc_genome_covered_by_scaffolds_greater_than_1mb 98.84123
15 scaffolds_greater_than_100Mb 0.00000
16 scaffolds_greater_than_100kb 133.00000
17 scaffolds_greater_than_10Mb 45.00000
18 scaffolds_greater_than_10kb 223.00000
19 scaffolds_greater_than_1Mb 90.00000
20 total bases 1633406540.00000
21 non_n_GC% 42.66179
22 non_n_AT% 57.33821
In [54]:
#statDF.to_csv('../data/supplemental_data/supplemental_table_5.csv',index=False)

Next, I use the module to calculate N50 like scores. I then plot those scores in the next figure.

In [55]:
nxDF = pd.DataFrame(dtGenome.return_nx_dist()).T
nxDF.columns=['perc','size']
nxDF = nxDF.sort_values('perc')
In [56]:
ax1 = nxDF.plot('perc','size', fontsize=minorFontSize, legend=False, linewidth=0.8, figsize=(3.6,3.4))

ax1.get_yaxis().get_major_formatter().set_scientific(False)
ax1.get_yaxis().set_major_formatter(FuncFormatter(lambda x, p: format(int(x/1000000), ',')))
ax1.set_ylabel('Scaffold Size (MB)',fontsize=minorFontSize)


ax1.set_xlabel('Percentage of Total Bases in Genome (%)',fontsize=minorFontSize)
ax1.set_xticks(np.arange(0,120,20))

ax1.vlines(x=50,color ='r',ymin=0,ymax=100000000,linestyle='--', linewidth=0.8)
ax1.hlines(y=32220929,color ='r',xmin=0,xmax=100,linestyle='--', linewidth=0.8)
ax1.annotate('N50 = 32.22 MB',xy=(15,28000000),fontsize=minorFontSize)

ax1.vlines(x=90,color ='g',ymin=0,ymax=100000000,linestyle='--', linewidth=0.8)
ax1.hlines(y=8340160,color ='g',xmin=0,xmax=100,linestyle='--', linewidth=0.8)
ax1.annotate('N90 = 8.34 MB',xy=(55,4500000),fontsize = minorFontSize)

ax1.set_ylim(0,90000000)


ax1.get_xaxis().tick_bottom()
ax1.get_yaxis().tick_left()

fig = ax1.get_figure()
#fig.savefig('../fig2/Figure3A.pdf', bbox_inches='tight')

Here I load in the sizes of each scaffold in the genome. These are used multiple times later in analysis.

In [57]:
scaffoldSizes = pd.read_csv('/n/projects/dut/a_marmorata/dovetail_genome_delivery/data/fastas/scaffold_sizes.clean.tsv',sep='\t',names=['scaffold','scaffold_size'])
scaffoldSizes = scaffoldSizes.sort_values('scaffold_size',ascending=False)
orderedScaffolds = scaffoldSizes.scaffold.tolist()
scaffoldSizeDict = dict(zip(scaffoldSizes.scaffold,scaffoldSizes.scaffold_size))

Isochore Analysis

Full details on the ischore analysis are included in a separate notebook, "../../marmorata_isochores" is the location of the project

In [58]:
#See notebook at ../../marmorata_isochores for full details on analysis
In [59]:
isochoreFiles = glob.glob('../../marmorata_isochores/data/*window.tsv')
isochoreFiles[:10]
Out[59]:
['../../marmorata_isochores/data/Gallus_gallus.Gallus_gallus-5.0.dna.toplevel.gc_5000_scaffold_window.tsv',
 '../../marmorata_isochores/data/Pelodiscus_sinensis.PelSin_1.0.dna.toplevel.gc_10000_scaffold_window.tsv',
 '../../marmorata_isochores/data/Danio_rerio.GRCz10.dna.toplevel.gc_80000_scaffold_window.tsv',
 '../../marmorata_isochores/data/tigris_scaffolds_filt_10000.gc_160000_scaffold_window.tsv',
 '../../marmorata_isochores/data/Pelodiscus_sinensis.PelSin_1.0.dna.toplevel.gc_160000_scaffold_window.tsv',
 '../../marmorata_isochores/data/Danio_rerio.GRCz10.dna.toplevel.gc_320000_scaffold_window.tsv',
 '../../marmorata_isochores/data/Homo_sapiens.GRCh38.dna.primary_assembly.gc_20000_scaffold_window.tsv',
 '../../marmorata_isochores/data/GCA_000186305.2_Python_molurus_bivittatus-5.0.2_genomic.gc_160000_scaffold_window.tsv',
 '../../marmorata_isochores/data/GCA_000186305.2_Python_molurus_bivittatus-5.0.2_genomic.gc_40000_scaffold_window.tsv',
 '../../marmorata_isochores/data/GCA_000186305.2_Python_molurus_bivittatus-5.0.2_genomic.gc_80000_scaffold_window.tsv']
In [60]:
list_std = []
isochorePathDict = {}
for path in isochoreFiles:
    sample = path.split('/')[-1].split('.')[:2]
    window_size = int(path.split('_')[-3])
    if 'Python' in sample[1]:
        sample = sample[1]
        
    elif 'tigris' in sample[0]:
        sample = 'Aspidoscelis_marmorata'
    else:
        sample = sample[0]
    name = sample.replace('2_','').replace('-5','').replace('_', ' ')
    isochorePathDict[(name,window_size)]=path
    windows_df = pd.read_csv(path, sep='\t', names=['scaffold', 'start', 'stop', 'perc_gc'])
    windows_df.dropna(inplace=True)
    amount_data = len(windows_df) * window_size
    std = windows_df.perc_gc.std()
    dict_data = {'organism':sample, 'window_size':window_size, 'bases_analyzed':amount_data, 'std_dev':std}
    list_std.append(dict_data)

    
gcSTD = pd.DataFrame(list_std)

Here I plot the standard deviation of GC content across various genomes with different window sizes. The data shows that Aspidoscelis marmoratus has similar isochore structure to that of the python, turtle, and mouse.

In [61]:
fig = plt.figure(1, figsize=(3.4, 3.4))
ax2 = fig.add_subplot(111)
organism_map = {'Gallus_gallus': 'Chicken', '2_Python_molurus_bivittatus-5': 'Burmese python', 'Aspidoscelis_marmorata': 'Aspidoscelis marmoratus', 
                'Pelodiscus_sinensis': 'Chinese softshell turtle', 'Homo_sapiens': 'Human', 'Canis_familiaris': 'Dog', 'Mus_musculus': 'Mouse', 
                'Danio_rerio': 'Zebrafish', 'Anolis_carolinensis': 'Carolina anole lizard'}

markers=['--o', '--v', '--^', '--<', '-->', '--8', '--s', '--p', '--h', '--H', '--D', '--d']
for i, organism in enumerate(gcSTD.organism.unique()):
    name = organism_map[organism]
    gc_dist = gcSTD[gcSTD.organism ==organism].copy()
    gc_dist.sort_values('window_size',inplace=True)
    ax2.plot(gc_dist['window_size'],
            gc_dist['std_dev'], 
            markers[i],
            label=name,
            markersize=4,
            linewidth=0.8)

ax2.set_xscale("log")
ax2.set_xticks([10] + gc_dist['window_size'].tolist())
ax2.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax2.set_xlim(2770,330000)
xtick_labels = [str(int(float(x)/1000.0)) for x in ax2.get_xticks()]
ax2.set_xticklabels(xtick_labels, fontsize=minorFontSize)

ax2.set_yticks(np.arange(0,11,1.0))
ax2.set_yticklabels(ax2.get_yticks(),fontsize=minorFontSize)

ax2.set_ylim(0,8)
ax2.set_ylabel('% GC Standard Deviation', fontsize=minorFontSize)
ax2.set_xlabel('log(Window Size(kb))', fontsize=minorFontSize)
ax2.legend(loc=5,bbox_to_anchor=(1.7, 0.5), prop={'size':8})

ax2.get_xaxis().tick_bottom()
ax2.get_yaxis().tick_left()

#fig.savefig('../fig2/Figure3B.pdf', bbox_inches='tight')

Busco Analysis (Done by Rutendo Sigauke)

In [62]:
%%bash
cp /n/projects/rfs/lizard/marmorata/busco_analysis/figures/busco_summary_figure.pdf ../fig2/SupplementalFigure1A.pdf
In [63]:
Image('/n/projects/rfs/lizard/marmorata/busco_analysis/figures/busco_summary_figure.png', height=900, width=900)
Out[63]:
In [64]:
%%bash
cp /n/projects/rfs/lizard/marmorata/busco_analysis/figures/phylogeny_1536BUSCOs_Gblocks_100bootstrap_Arial_Common_Name.pdf ../fig2/SupplementalFigure1B.pdf
In [65]:
Image('/n/projects/rfs/lizard/marmorata/busco_analysis/figures/phylogeny_1536BUSCOs_Gblocks_100bootstrap_Arial_Common_Name.png', height=900, width=900)
Out[65]:

Synteny Analysis (Done by Rutendo Sigauke)

In [66]:
%%bash
cp /n/projects/rfs/lizard/marmorata/synteny/figures/Marmorata_Anole_synteny_99score_allScaff_Arial_common_name.pdf ../fig2/SupplementalFigure1C.pdf
In [67]:
Image('/n/projects/rfs/lizard/marmorata/synteny/figures/Marmorata_Anole_synteny_99score_allScaff_Arial_common_name.png', height=600, width=600)
Out[67]:
In [68]:
%%bash
cp /n/projects/rfs/lizard/marmorata/synteny/figures/Marmorata_Chicken_synteny_99score_allScaff_Arial_common_names.pdf ../fig2/SupplementalFigure1D.pdf
In [69]:
Image('/n/projects/rfs/lizard/marmorata/synteny/figures/Marmorata_Chicken_synteny_99score_allScaff_Arial_common_names.png', height=600, width=600)
Out[69]:

Repeat Analysis (Done by Rutendo Sigauke)

In [70]:
%%bash
cp /n/projects/rfs/lizard/marmorata/repeats/analysis/figures/repeats_genomes_summary_common_names_Ariel.pdf ../fig2/SupplementalFigure2.pdf
In [71]:
Image('/n/projects/rfs/lizard/marmorata/repeats/analysis/figures/repeats_genomes_summary_common_names_Ariel.png', height=600, width=600)
Out[71]:
In [72]:
%%bash 
cp /n/projects/rfs/lizard/marmorata/repeats/repeat_masker/repeat_summaries/RepeatFamilySummariesPercentOfGenome.tsv ../data/supplemental_data/supplemental_table_6.tsv
In [73]:
pd.read_csv('../data/supplemental_data/supplemental_table_6.tsv', sep='\t').tail()
Out[73]:
repeats marmorata anole xenopus gallus python
67 SINE/tRNA-Sauria-L2 0.000000 1.134833 0.000000 0.000000 0.011473
68 SINE/tRNA-Sauria-RTE 0.000000 0.102815 0.000000 0.000000 0.099582
69 SINE/U 0.000000 0.000624 0.011249 0.000000 0.000000
70 snRNA 0.004796 0.000000 0.009599 0.000000 0.001264
71 Unknown 20.640800 7.684774 5.144897 0.531585 8.898379

Load Annotations

The following python module can be used to manipulate and plot the contained in a GFF3 formatted file.

In [74]:
# %load ../bin/gff3_plotting.py

Here I load in the genome annotations, these are used later in the hox gene analysis and vmnr analysis.

In [75]:
gffMarmAbInPath='../data/augustus.renamed.putative_function.iprscan.gff3'
gffMarmPath = '../data/a_tigris_1.renamed.putative_function.iprscan.wintrons.gff'
gffMarmDF= gff3.gff3_to_attribute_df(gff_path=gffMarmPath,ensembl=False)
gffMarmDF = gff3.add_gene_symbol(gffMarmDF, ensembl=False)
gffMarmDF = gff3.add_gene_id(gffMarmDF, ensembl=False)
gffMarmDF = gff3.add_gene_name(gffMarmDF, ensembl=False)
gffMarmDF['count_column'] = gffMarmDF.apply(lambda row: (row['seqid'], row['gene_symbol']), axis=1)
gffMarmDF['source'] = 'MAKER2'
691299 691299
In [76]:
gffMarmAbInDF= gff3.gff3_to_attribute_df(gff_path=gffMarmAbInPath,ensembl=False)
gffMarmAbInDF = gff3.add_gene_symbol(gffMarmAbInDF, ensembl=False)
gffMarmAbInDF = gff3.add_gene_id(gffMarmAbInDF, ensembl=False)
gffMarmAbInDF = gff3.add_gene_name(gffMarmAbInDF, ensembl=False)
gffMarmAbInDF['count_column'] = gffMarmAbInDF.apply(lambda row: (row['seqid'], row['gene_symbol']), axis=1)
gffMarmAbInDF['source']='BRAKER1'
1737575 1737575
In [77]:
supplementalAnnotDF = pd.concat(
    [
        gffMarmAbInDF[['source','feature']], 
        gffMarmDF[['source','feature']]
    ]
     )
supplementalAnnotDF['count'] =1
In [78]:
supplementalAnnotDF = supplementalAnnotDF.groupby(['source','feature']).sum().reset_index()
supplementalAnnotDF
Out[78]:
source feature count
0 BRAKER1 CDS 424058
1 BRAKER1 exon 424058
2 BRAKER1 gene 152361
3 BRAKER1 intron 267602
4 BRAKER1 mRNA 156525
5 BRAKER1 start_codon 156457
6 BRAKER1 stop_codon 156514
7 MAKER2 CDS 222450
8 MAKER2 exon 164669
9 MAKER2 five_prime_UTR 22402
10 MAKER2 gene 25856
11 MAKER2 intron 189732
12 MAKER2 mRNA 44461
13 MAKER2 three_prime_UTR 21729
In [79]:
supplementalAnnotDF = supplementalAnnotDF.T
In [80]:
supplementalAnnotDF = supplementalAnnotDF.reset_index()
supplementalAnnotDF = supplementalAnnotDF.rename(columns=supplementalAnnotDF.iloc[1])
supplementalAnnotDF = supplementalAnnotDF.reindex(supplementalAnnotDF.index.drop(1))
In [81]:
supplementalAnnotDF
Out[81]:
feature CDS exon gene intron mRNA start_codon stop_codon CDS exon five_prime_UTR gene intron mRNA three_prime_UTR
0 source BRAKER1 BRAKER1 BRAKER1 BRAKER1 BRAKER1 BRAKER1 BRAKER1 MAKER2 MAKER2 MAKER2 MAKER2 MAKER2 MAKER2 MAKER2
2 count 424058 424058 152361 267602 156525 156457 156514 222450 164669 22402 25856 189732 44461 21729
In [82]:
supplementalAnnotDF.T.to_csv('../data/supplemental_data/supplemental_table_7.csv',index=True,header='False')

HOX Gene Analysis

Here I link to the braker ab initio gene predictions

In [83]:
%%bash
cd ../data
#ln -s ../../dovetail_braker_annotation/data/augustus.renamed.putative_function.iprscan.gff3

Next, I downloaded annotation sets for human, mouse, and anolis. I was originally going to compare annotations directly in the GFF file, but I decided not too because the annotations for mouse and human have far more evidence than the annotations for anolis.

In [84]:
%%bash
cd ../data
#wget ftp://ftp.ensembl.org/pub/release-87/gff3/homo_sapiens/Homo_sapiens.GRCh38.87.abinitio.gff3.gz
#wget ftp://ftp.ensembl.org/pub/release-87/gff3/homo_sapiens/Homo_sapiens.GRCh38.87.chr.gff3.gz
#gunzip Homo_sapiens.GRCh38.87.abinitio.gff3.gz
#gunzip Homo_sapiens.GRCh38.87.chr.gff3.gz
In [85]:
%%bash
cd ../data
#wget ftp://ftp.ensembl.org/pub/release-87/gff3/mus_musculus/Mus_musculus.GRCm38.87.chr.gff3.gz
#wget ftp://ftp.ensembl.org/pub/release-87/gff3/mus_musculus/Mus_musculus.GRCm38.87.abinitio.gff3.gz
#gunzip Mus_musculus.GRCm38.87.chr.gff3.gz
#gunzip Mus_musculus.GRCm38.87.abinitio.gff3.gz
In [86]:
%%bash
cd ../data
#wget ftp://ftp.ensembl.org/pub/release-87/gff3/anolis_carolinensis/Anolis_carolinensis.AnoCar2.0.87.chr.gff3.gz
#wget ftp://ftp.ensembl.org/pub/release-87/gff3/anolis_carolinensis/Anolis_carolinensis.AnoCar2.0.87.abinitio.gff3.gz
#gunzip Anolis_carolinensis.AnoCar2.0.87.chr.gff3.gz
#gunzip Anolis_carolinensis.AnoCar2.0.87.abinitio.gff3.gz
In [87]:
%%bash
cd ../data
#wget ftp://ftp.ensembl.org/pub/release-87/gff3/anolis_carolinensis/Anolis_carolinensis.AnoCar2.0.87.gff3.gz
#gunzip Anolis_carolinensis.AnoCar2.0.87.gff3.gz

Here I am searching for the HOX gene clusters and plotting them using the python script loaded into this notebook (gff3).

In [88]:
for c in ['a','b','c','d']:
    #Aspidoscelis marmorata
    regex = 'hox%s|evx'%c
    organism = 'Aspidoscelis marmoratus'
    fig, ax = gff3.plot_single_cluster_series(gff3_paths=['../data/augustus.renamed.putative_function.iprscan.gff3'], subset_regex=regex, species=organism, path_targ='augustus')
    #fig.savefig('../fig/%s_hox-%s.pdf'%(organism, c), bbox_inches='tight', pad_inches=0.2)
    plt.show()
1737575 1737575
1737575
165
1737575 1737575
1737575
171
1737575 1737575
1737575
168
1737575 1737575
1737575
200

Here I subset out the scaffolds which contain the hox gene clusters. This was done in case they were needed for supplemental. I also extracted out the annotations for just those scaffolds from the GFF3 braker gene predictions.

In [89]:
hoxScaffolds = ['>Scpiz6a_1','>Scpiz6a_37','>Scpiz6a_86','>Scpiz6a_30.1']
hoxGenomeDict = {k:v for k,v in dtGenome.fasta_dict.items() if k in hoxScaffolds}
fa.write_two_line_fasta('../data/hox_scaffolds.fa', list(hoxGenomeDict.items()))
In [90]:
augHoxGFFDF = pd.read_csv('../data/augustus.renamed.putative_function.iprscan.gff3',sep='\t',header=None)
augHoxGFFDF = augHoxGFFDF[augHoxGFFDF[0].isin([s.strip('>') for s in hoxScaffolds])]
augHoxGFFDF.to_csv('../data/aug_hox_annotations.gff3',sep='\t')
augHoxGFFDF.head()
Out[90]:
0 1 2 3 4 5 6 7 8
348850 Scpiz6a_37 AUGUSTUS gene 3902 4216 0.59 + . ID=ATIG_ab_00023958;Name=;Alias=g31007;Note=Similar to NSMF: NMDA receptor synaptonuclear signaling and neuronal migration factor (Homo sapiens);
348851 Scpiz6a_37 AUGUSTUS mRNA 3902 4216 0.59 + . ID=ATIG_ab_00023958-RA;Parent=ATIG_ab_00023958;Name=;Alias=g31007.t1;Note=Similar to NSMF: NMDA receptor synaptonuclear signaling and neuronal migration factor (Homo sapiens);
348852 Scpiz6a_37 AUGUSTUS start_codon 3902 3904 . + 0 Parent=ATIG_ab_00023958-RA;
348853 Scpiz6a_37 AUGUSTUS CDS 3902 4216 0.59 + 0 ID=g31007.t1.CDS1;Parent=ATIG_ab_00023958-RA;
348854 Scpiz6a_37 AUGUSTUS exon 3902 4216 . + . ID=g31007.t1.exon1;Parent=ATIG_ab_00023958-RA;

Because the above figures are a bit messy looking I decided to make a simple schematic for the text. I start with pulling out all the gene coordinates for the above figures.

In [91]:
%%bash
cd ../data
#awk '$3=="gene"' augustus.renamed.putative_function.iprscan.gff3 | grep -E -i ' hox|evx' | awk  '$1=="Scpiz6a_86" || $1=="Scpiz6a_30.1" || $1=="Scpiz6a_37" || $1=="Scpiz6a_1"' | sed 's/Similar to /\t/' | cut -f -5,7,10 | tr ':' '\t' | cut -f -7 > hox_gene_coords.tsv

I also independently found HOXB13 in the MAKER annotations by manually blasting a nearby model. I next added the coordinates from that model to the file I generated above.

In [92]:
%%bash
cd ../data
#grep ATIG_00008349 a_tigris_1.renamed.putative_function.iprscan.wintrons.gff  | awk '$3=="gene"' | cut -f -5,7 | tr '\n' '~' | sed "s/~/@Hoxb13~/" | tr '~' '\n' | tr '@' '\t' >> hox_gene_coords.tsv

I found later on that hoxc3 is present in the anolis lizard (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2665779/). I couldn't find a copy of the sequence mentioned in the paper, but I was able to find a sequence for Xenopus HOXC3. I generated the following curl command directly from the ensemble website.

In [93]:
%%bash
cd ../data
#curl --header 'Host: www.ensembl.org' \
#--header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:52.0) Gecko/20100101 Firefox/52.0' \
#--header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' \
#--header 'Accept-Language: en-US,en;q=0.5' \
#--header 'Referer: http://www.ensembl.org/Xenopus_tropicalis/Gene/Sequence?db=core;g=ENSXETG00000025181;r=GL172862.1:552422-558412;t=ENSXETT00000053914' \
#--header 'Cookie: _ga=GA1.2.780793432.1475001731; ENSEMBL_WIDTH=2300; DYNAMIC_WIDTH=1; _pk_ref..c10e=%5B%22%22%2C%22%22%2C1481326908%2C%22http%3A%2F%2Fuseast.ensembl.org%2FAnolis_carolinensis%2FInfo%2FIndex%22%5D; _pk_id..c10e=6f2a23a1cab0d268.1481326908.1.1481327562.1481326908.; ENSEMBL_HINX_SESSION=0ea132a2ddae24e00aa90673589a5dc90ab37997b5a94c19eec2634e; ENSEMBL_GENOVERSE_SCROLL=0; _gid=GA1.2.2055248845.1494971795; ENSEMBL_REGION_PAN=1; ENSEMBL_SEARCH=ensembl_all; _gali=KDbFOwdW_3; _gat=1' \
#--header 'DNT: 1' \
#--header 'Connection: keep-alive' \
#--header 'Upgrade-Insecure-Requests: 1' 'http://www.ensembl.org/Xenopus_tropicalis/Download/DataExport?compression=;db=core;file=temporary/2017_05_16/session_295535549/HAUljTwe/Xenopus_tropicalis_hoxc3_sequence.fa;filename=Xenopus_tropicalis_hoxc3_sequence.fa;format=FASTA;g=ENSXETG00000025181;r=GL172862.1:552422-558412;t=ENSXETT00000053914' \
#-o 'Xenopus_tropicalis_hoxc3_sequence.fa' \
#-L
cat ../data/Xenopus_tropicalis_hoxc3_sequence.fa
>hoxc3-201 peptide: ENSXETP00000053914 pep:KNOWN_protein_coding
MPKSLFYDNSASFGGCGFQGNNGMGYLGQQDYPSEDDYQPPFCLPPDTANGSASHKGEHS
IKGIDFHLSEVSEQAQQPKSPNSDSPLPKSASTQSCTSKKSTGPVSSDVTSPNKKSKGSN
MPKQIFPWMKETRQNSKQKKQAPPPAEEACKVREKNTGSFSASKRARTAYTNSQLVELEK
EFHFNRYLCRPRRLEMAKLLNLSERQIKIWFQNRRMKFKKDHKGKGGGGSPGGLSPSSSP
SLMPYSGNLPLDGDCGYEVPMATGAYNKSPGNMYGLTAYSAPLFEGPSAQKRYGPQSLAP
EYDPHSMQGDNNYDTSGLPNGQGYLGNYLENGSESCSMFSLPHPSSESMDYSCAAQTPSK
HHLGPCDPHPTYTDLHIHPVPQACSQEPPVLTHL

In [94]:
%%bash
#exonerate --model protein2genome \
#--showvulgar no \
#--showalignment no \
#--showtargetgff yes \
#--maxintron 50000 \
#--ryo ">%qi length=%ql alnlen=%qaln>%ti length=%tl alnlen=%taln" \
#--query ../data/Xenopus_tropicalis_hoxc3_sequence.fa \
#--target ../data/tigris_scaffolds_filt_10000.fa \
#> Xenopus_tropicalis_hoxc3_to_marmoratus_genome.out &
#mv Xenopus_tropicalis_hoxc3_to_marmoratus_genome.out ../data
In [95]:
%%bash
cd ../data
#grep -v '^#' Xenopus_tropicalis_hoxc3_to_marmoratus_genome.out | grep -v '^>' | grep -v 'bonsai.sgc.loc' | grep -v '^Command line:' > Xenopus_tropicalis_hoxc3_to_marmoratus_genome.gff
In [96]:
xenopusHoxcHits = pd.read_csv('../data/Xenopus_tropicalis_hoxc3_to_marmoratus_genome.gff',sep='\t', names=['seqid', 'source', 'feature', 'start', 'end', 'score', 'strand', 'phase', 'attributes'])
xenopusHoxcHits.head()
Out[96]:
seqid source feature start end score strand phase attributes
0 Scpiz6a_49 exonerate:protein2genome:local gene 19691975 19692514 198 + . gene_id 1 ; sequence hoxc3-201 ; gene_orientation + ; identity 35.40 ; similarity 50.31
1 Scpiz6a_49 exonerate:protein2genome:local cds 19691975 19692248 . + . NaN
2 Scpiz6a_49 exonerate:protein2genome:local exon 19691975 19692248 . + . insertions 0 ; deletions 0 ; identity 43.96 ; similarity 57.14
3 Scpiz6a_49 exonerate:protein2genome:local splice5 19692249 19692250 . + . intron_id 1 ; splice_site "TC"
4 Scpiz6a_49 exonerate:protein2genome:local intron 19692249 19692302 . + . intron_id 1

The second best hit appears in the right area.

In [97]:
xenopusHoxcHits[(xenopusHoxcHits.feature == 'gene') &(xenopusHoxcHits.seqid == 'Scpiz6a_37')].sort_values('score',ascending=False).head()
Out[97]:
seqid source feature start end score strand phase attributes
391 Scpiz6a_37 exonerate:protein2genome:local gene 9403550 9404943 346 - . gene_id 1 ; sequence hoxc3-201 ; gene_orientation + ; identity 60.80 ; similarity 72.00
400 Scpiz6a_37 exonerate:protein2genome:local gene 9473492 9474480 243 - . gene_id 2 ; sequence hoxc3-201 ; gene_orientation + ; identity 40.85 ; similarity 56.34
409 Scpiz6a_37 exonerate:protein2genome:local gene 9500001 9500189 228 - . gene_id 3 ; sequence hoxc3-201 ; gene_orientation . ; identity 71.43 ; similarity 82.54
413 Scpiz6a_37 exonerate:protein2genome:local gene 9504971 9505144 205 - . gene_id 4 ; sequence hoxc3-201 ; gene_orientation . ; identity 67.24 ; similarity 81.03
417 Scpiz6a_37 exonerate:protein2genome:local gene 9535775 9535966 203 - . gene_id 5 ; sequence hoxc3-201 ; gene_orientation . ; identity 60.94 ; similarity 78.12
In [98]:
%%bash
cd ../data
#cat Xenopus_tropicalis_hoxc3_to_marmoratus_genome.gff | awk '$1=="Scpiz6a_37" && $4==9403550 && $3=="gene"' | cut -f -5,7 | tr '\n' '~' | sed "s/~/@Hoxc3~/" | tr '~' '\n' | tr '@' '\t' >> hox_gene_coords.tsv
In [99]:
hoxExonCoords = pd.read_csv('../data/hox_gene_coords.tsv', sep='\t', names=['seqid','source','type','start','stop','strand','name'])
hoxExonCoords
Out[99]:
seqid source type start stop strand name
0 Scpiz6a_37 AUGUSTUS gene 9282120 9288133 - hoxa1
1 Scpiz6a_37 AUGUSTUS gene 9474154 9474759 - HOXC4
2 Scpiz6a_37 AUGUSTUS gene 9499983 9501230 - Hoxc5
3 Scpiz6a_37 AUGUSTUS gene 9504857 9506031 - hoxc6
4 Scpiz6a_37 AUGUSTUS gene 9526250 9528579 - HOXC8
5 Scpiz6a_37 AUGUSTUS gene 9535748 9538860 - Hoxc9
6 Scpiz6a_37 AUGUSTUS gene 9550771 9554871 - HOXC10
7 Scpiz6a_37 AUGUSTUS gene 9562146 9568818 - hoxc11a
8 Scpiz6a_37 AUGUSTUS gene 9589583 9593356 - hoxc12a
9 Scpiz6a_37 AUGUSTUS gene 9593673 9594302 - HOXC12
10 Scpiz6a_37 AUGUSTUS gene 9606524 9610151 - hoxc13a
11 Scpiz6a_37 AUGUSTUS gene 18770207 18772219 - hoxM
12 Scpiz6a_37 AUGUSTUS gene 67494760 67495389 + HOXA4
13 Scpiz6a_30.1 AUGUSTUS gene 12112697 12116784 - EVX1
14 Scpiz6a_30.1 AUGUSTUS gene 12118635 12119576 - EVX1
15 Scpiz6a_30.1 AUGUSTUS gene 12199764 12201101 + HOXA13
16 Scpiz6a_30.1 AUGUSTUS gene 12219716 12221513 + HOXA11
17 Scpiz6a_30.1 AUGUSTUS gene 12233452 12234549 + HOXA10
18 Scpiz6a_30.1 AUGUSTUS gene 12235671 12236180 + Hoxa10
19 Scpiz6a_30.1 AUGUSTUS gene 12243941 12246059 + HOXA9
20 Scpiz6a_30.1 AUGUSTUS gene 12254353 12254898 + HOXA7
21 Scpiz6a_30.1 AUGUSTUS gene 12255141 12256495 + HOXA7
22 Scpiz6a_30.1 AUGUSTUS gene 12263757 12265990 + HOXA6
23 Scpiz6a_30.1 AUGUSTUS gene 12267708 12269129 + HOXA5
24 Scpiz6a_30.1 AUGUSTUS gene 12282223 12283761 + HOXA4
25 Scpiz6a_30.1 AUGUSTUS gene 12303335 12306741 + HOXA3
26 Scpiz6a_30.1 AUGUSTUS gene 12312194 12314293 + HOXA2
27 Scpiz6a_30.1 AUGUSTUS gene 12319290 12323555 + Hoxa1
28 Scpiz6a_30.1 AUGUSTUS gene 45958042 45958654 + HOX3
29 Scpiz6a_1 AUGUSTUS gene 13989488 13990662 + hoxU
30 Scpiz6a_1 AUGUSTUS gene 17298735 17299199 + hoxM
31 Scpiz6a_1 AUGUSTUS gene 25629515 25631702 - Hoxd1
32 Scpiz6a_1 AUGUSTUS gene 25663228 25667221 - hoxd3a
33 Scpiz6a_1 AUGUSTUS gene 25692499 25693667 - HOXD4
34 Scpiz6a_1 AUGUSTUS gene 25722293 25724423 - Hoxd8
35 Scpiz6a_1 AUGUSTUS gene 25730347 25732122 - Hoxd9
36 Scpiz6a_1 AUGUSTUS gene 25732652 25739472 - Hoxd10
37 Scpiz6a_1 AUGUSTUS gene 25749545 25750504 - HOXD11
38 Scpiz6a_1 AUGUSTUS gene 25758704 25759860 - HOXD12
39 Scpiz6a_1 AUGUSTUS gene 25760003 25760716 - HOXD12
40 Scpiz6a_1 AUGUSTUS gene 25770564 25772996 - HOXD13
41 Scpiz6a_1 AUGUSTUS gene 25786550 25789410 + Evx2
42 Scpiz6a_86 AUGUSTUS gene 6207845 6211976 + HOXB9
43 Scpiz6a_86 AUGUSTUS gene 6217173 6222005 + HOXB8
44 Scpiz6a_86 AUGUSTUS gene 6226105 6226683 + HOXB7
45 Scpiz6a_86 AUGUSTUS gene 6227951 6232453 + HOXB7
46 Scpiz6a_86 AUGUSTUS gene 6243660 6246408 + Hoxb6
47 Scpiz6a_86 AUGUSTUS gene 6249716 6250567 + hoxb5a
48 Scpiz6a_86 AUGUSTUS gene 6252113 6263275 + HOXB5
49 Scpiz6a_86 AUGUSTUS gene 6272733 6275177 + HOXB4
50 Scpiz6a_86 AUGUSTUS gene 6308863 6317043 + hoxb3a
51 Scpiz6a_86 AUGUSTUS gene 6328011 6330803 + HOXB2
52 Scpiz6a_86 AUGUSTUS gene 6356743 6360890 + HOXB1
53 Scpiz6a_86 maker gene 6098404 6105626 + Hoxb13
54 Scpiz6a_37 exonerate:protein2genome:local gene 9403550 9404943 - Hoxc3

Here I drop annotations that I accidentally picked up a

In [100]:
hoxExonCoords = hoxExonCoords[~hoxExonCoords.name.isin(['hoxU','hoxM','HOX3'])]
hoxExonCoords['factor_name'] = hoxExonCoords.name.apply(lambda x: x.upper()[:-1] if x[-1] not in [str(s) for s in range(20)] else x.upper())
hoxExonCoords = hoxExonCoords.sort_values(['seqid', 'start']).reset_index(drop=True)
hoxExonCoords
/home/dut/anaconda3/envs/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app
Out[100]:
seqid source type start stop strand name factor_name
0 Scpiz6a_1 AUGUSTUS gene 25629515 25631702 - Hoxd1 HOXD1
1 Scpiz6a_1 AUGUSTUS gene 25663228 25667221 - hoxd3a HOXD3
2 Scpiz6a_1 AUGUSTUS gene 25692499 25693667 - HOXD4 HOXD4
3 Scpiz6a_1 AUGUSTUS gene 25722293 25724423 - Hoxd8 HOXD8
4 Scpiz6a_1 AUGUSTUS gene 25730347 25732122 - Hoxd9 HOXD9
5 Scpiz6a_1 AUGUSTUS gene 25732652 25739472 - Hoxd10 HOXD10
6 Scpiz6a_1 AUGUSTUS gene 25749545 25750504 - HOXD11 HOXD11
7 Scpiz6a_1 AUGUSTUS gene 25758704 25759860 - HOXD12 HOXD12
8 Scpiz6a_1 AUGUSTUS gene 25760003 25760716 - HOXD12 HOXD12
9 Scpiz6a_1 AUGUSTUS gene 25770564 25772996 - HOXD13 HOXD13
10 Scpiz6a_1 AUGUSTUS gene 25786550 25789410 + Evx2 EVX2
11 Scpiz6a_30.1 AUGUSTUS gene 12112697 12116784 - EVX1 EVX1
12 Scpiz6a_30.1 AUGUSTUS gene 12118635 12119576 - EVX1 EVX1
13 Scpiz6a_30.1 AUGUSTUS gene 12199764 12201101 + HOXA13 HOXA13
14 Scpiz6a_30.1 AUGUSTUS gene 12219716 12221513 + HOXA11 HOXA11
15 Scpiz6a_30.1 AUGUSTUS gene 12233452 12234549 + HOXA10 HOXA10
16 Scpiz6a_30.1 AUGUSTUS gene 12235671 12236180 + Hoxa10 HOXA10
17 Scpiz6a_30.1 AUGUSTUS gene 12243941 12246059 + HOXA9 HOXA9
18 Scpiz6a_30.1 AUGUSTUS gene 12254353 12254898 + HOXA7 HOXA7
19 Scpiz6a_30.1 AUGUSTUS gene 12255141 12256495 + HOXA7 HOXA7
20 Scpiz6a_30.1 AUGUSTUS gene 12263757 12265990 + HOXA6 HOXA6
21 Scpiz6a_30.1 AUGUSTUS gene 12267708 12269129 + HOXA5 HOXA5
22 Scpiz6a_30.1 AUGUSTUS gene 12282223 12283761 + HOXA4 HOXA4
23 Scpiz6a_30.1 AUGUSTUS gene 12303335 12306741 + HOXA3 HOXA3
24 Scpiz6a_30.1 AUGUSTUS gene 12312194 12314293 + HOXA2 HOXA2
25 Scpiz6a_30.1 AUGUSTUS gene 12319290 12323555 + Hoxa1 HOXA1
26 Scpiz6a_37 AUGUSTUS gene 9282120 9288133 - hoxa1 HOXA1
27 Scpiz6a_37 exonerate:protein2genome:local gene 9403550 9404943 - Hoxc3 HOXC3
28 Scpiz6a_37 AUGUSTUS gene 9474154 9474759 - HOXC4 HOXC4
29 Scpiz6a_37 AUGUSTUS gene 9499983 9501230 - Hoxc5 HOXC5
30 Scpiz6a_37 AUGUSTUS gene 9504857 9506031 - hoxc6 HOXC6
31 Scpiz6a_37 AUGUSTUS gene 9526250 9528579 - HOXC8 HOXC8
32 Scpiz6a_37 AUGUSTUS gene 9535748 9538860 - Hoxc9 HOXC9
33 Scpiz6a_37 AUGUSTUS gene 9550771 9554871 - HOXC10 HOXC10
34 Scpiz6a_37 AUGUSTUS gene 9562146 9568818 - hoxc11a HOXC11
35 Scpiz6a_37 AUGUSTUS gene 9589583 9593356 - hoxc12a HOXC12
36 Scpiz6a_37 AUGUSTUS gene 9593673 9594302 - HOXC12 HOXC12
37 Scpiz6a_37 AUGUSTUS gene 9606524 9610151 - hoxc13a HOXC13
38 Scpiz6a_37 AUGUSTUS gene 67494760 67495389 + HOXA4 HOXA4
39 Scpiz6a_86 maker gene 6098404 6105626 + Hoxb13 HOXB13
40 Scpiz6a_86 AUGUSTUS gene 6207845 6211976 + HOXB9 HOXB9
41 Scpiz6a_86 AUGUSTUS gene 6217173 6222005 + HOXB8 HOXB8
42 Scpiz6a_86 AUGUSTUS gene 6226105 6226683 + HOXB7 HOXB7
43 Scpiz6a_86 AUGUSTUS gene 6227951 6232453 + HOXB7 HOXB7
44 Scpiz6a_86 AUGUSTUS gene 6243660 6246408 + Hoxb6 HOXB6
45 Scpiz6a_86 AUGUSTUS gene 6249716 6250567 + hoxb5a HOXB5
46 Scpiz6a_86 AUGUSTUS gene 6252113 6263275 + HOXB5 HOXB5
47 Scpiz6a_86 AUGUSTUS gene 6272733 6275177 + HOXB4 HOXB4
48 Scpiz6a_86 AUGUSTUS gene 6308863 6317043 + hoxb3a HOXB3
49 Scpiz6a_86 AUGUSTUS gene 6328011 6330803 + HOXB2 HOXB2
50 Scpiz6a_86 AUGUSTUS gene 6356743 6360890 + HOXB1 HOXB1

There are still certain HOX gene homologies present on scaffolds outside of the clusters.

In [101]:
hoxExonCoords = hoxExonCoords.drop(hoxExonCoords.index[[38,26]])
hoxExonCoords
Out[101]:
seqid source type start stop strand name factor_name
0 Scpiz6a_1 AUGUSTUS gene 25629515 25631702 - Hoxd1 HOXD1
1 Scpiz6a_1 AUGUSTUS gene 25663228 25667221 - hoxd3a HOXD3
2 Scpiz6a_1 AUGUSTUS gene 25692499 25693667 - HOXD4 HOXD4
3 Scpiz6a_1 AUGUSTUS gene 25722293 25724423 - Hoxd8 HOXD8
4 Scpiz6a_1 AUGUSTUS gene 25730347 25732122 - Hoxd9 HOXD9
5 Scpiz6a_1 AUGUSTUS gene 25732652 25739472 - Hoxd10 HOXD10
6 Scpiz6a_1 AUGUSTUS gene 25749545 25750504 - HOXD11 HOXD11
7 Scpiz6a_1 AUGUSTUS gene 25758704 25759860 - HOXD12 HOXD12
8 Scpiz6a_1 AUGUSTUS gene 25760003 25760716 - HOXD12 HOXD12
9 Scpiz6a_1 AUGUSTUS gene 25770564 25772996 - HOXD13 HOXD13
10 Scpiz6a_1 AUGUSTUS gene 25786550 25789410 + Evx2 EVX2
11 Scpiz6a_30.1 AUGUSTUS gene 12112697 12116784 - EVX1 EVX1
12 Scpiz6a_30.1 AUGUSTUS gene 12118635 12119576 - EVX1 EVX1
13 Scpiz6a_30.1 AUGUSTUS gene 12199764 12201101 + HOXA13 HOXA13
14 Scpiz6a_30.1 AUGUSTUS gene 12219716 12221513 + HOXA11 HOXA11
15 Scpiz6a_30.1 AUGUSTUS gene 12233452 12234549 + HOXA10 HOXA10
16 Scpiz6a_30.1 AUGUSTUS gene 12235671 12236180 + Hoxa10 HOXA10
17 Scpiz6a_30.1 AUGUSTUS gene 12243941 12246059 + HOXA9 HOXA9
18 Scpiz6a_30.1 AUGUSTUS gene 12254353 12254898 + HOXA7 HOXA7
19 Scpiz6a_30.1 AUGUSTUS gene 12255141 12256495 + HOXA7 HOXA7
20 Scpiz6a_30.1 AUGUSTUS gene 12263757 12265990 + HOXA6 HOXA6
21 Scpiz6a_30.1 AUGUSTUS gene 12267708 12269129 + HOXA5 HOXA5
22 Scpiz6a_30.1 AUGUSTUS gene 12282223 12283761 + HOXA4 HOXA4
23 Scpiz6a_30.1 AUGUSTUS gene 12303335 12306741 + HOXA3 HOXA3
24 Scpiz6a_30.1 AUGUSTUS gene 12312194 12314293 + HOXA2 HOXA2
25 Scpiz6a_30.1 AUGUSTUS gene 12319290 12323555 + Hoxa1 HOXA1
27 Scpiz6a_37 exonerate:protein2genome:local gene 9403550 9404943 - Hoxc3 HOXC3
28 Scpiz6a_37 AUGUSTUS gene 9474154 9474759 - HOXC4 HOXC4
29 Scpiz6a_37 AUGUSTUS gene 9499983 9501230 - Hoxc5 HOXC5
30 Scpiz6a_37 AUGUSTUS gene 9504857 9506031 - hoxc6 HOXC6
31 Scpiz6a_37 AUGUSTUS gene 9526250 9528579 - HOXC8 HOXC8
32 Scpiz6a_37 AUGUSTUS gene 9535748 9538860 - Hoxc9 HOXC9
33 Scpiz6a_37 AUGUSTUS gene 9550771 9554871 - HOXC10 HOXC10
34 Scpiz6a_37 AUGUSTUS gene 9562146 9568818 - hoxc11a HOXC11
35 Scpiz6a_37 AUGUSTUS gene 9589583 9593356 - hoxc12a HOXC12
36 Scpiz6a_37 AUGUSTUS gene 9593673 9594302 - HOXC12 HOXC12
37 Scpiz6a_37 AUGUSTUS gene 9606524 9610151 - hoxc13a HOXC13
39 Scpiz6a_86 maker gene 6098404 6105626 + Hoxb13 HOXB13
40 Scpiz6a_86 AUGUSTUS gene 6207845 6211976 + HOXB9 HOXB9
41 Scpiz6a_86 AUGUSTUS gene 6217173 6222005 + HOXB8 HOXB8
42 Scpiz6a_86 AUGUSTUS gene 6226105 6226683 + HOXB7 HOXB7
43 Scpiz6a_86 AUGUSTUS gene 6227951 6232453 + HOXB7 HOXB7
44 Scpiz6a_86 AUGUSTUS gene 6243660 6246408 + Hoxb6 HOXB6
45 Scpiz6a_86 AUGUSTUS gene 6249716 6250567 + hoxb5a HOXB5
46 Scpiz6a_86 AUGUSTUS gene 6252113 6263275 + HOXB5 HOXB5
47 Scpiz6a_86 AUGUSTUS gene 6272733 6275177 + HOXB4 HOXB4
48 Scpiz6a_86 AUGUSTUS gene 6308863 6317043 + hoxb3a HOXB3
49 Scpiz6a_86 AUGUSTUS gene 6328011 6330803 + HOXB2 HOXB2
50 Scpiz6a_86 AUGUSTUS gene 6356743 6360890 + HOXB1 HOXB1

Next I collapse all of the coordinates for similar gene models into a single range of evidence for each HOX gene.

In [102]:
collapsedCoordsList = []
for hox_factor in hoxExonCoords.factor_name.unique():
    hox_df = hoxExonCoords[hoxExonCoords.factor_name == hox_factor]
    start = hox_df.start.min()
    stop = hox_df.stop.max()
    data_dict = {'seqid':hox_df.seqid.unique()[0], 'gene':hox_factor, 'start':start, 'stop':stop, 'strand':hox_df.strand.unique()[0]}
    collapsedCoordsList.append(data_dict)
    
hoxGeneCoordsDF = pd.DataFrame(collapsedCoordsList)
hoxGeneCoordsDF
Out[102]:
gene seqid start stop strand
0 HOXD1 Scpiz6a_1 25629515 25631702 -
1 HOXD3 Scpiz6a_1 25663228 25667221 -
2 HOXD4 Scpiz6a_1 25692499 25693667 -
3 HOXD8 Scpiz6a_1 25722293 25724423 -
4 HOXD9 Scpiz6a_1 25730347 25732122 -
5 HOXD10 Scpiz6a_1 25732652 25739472 -
6 HOXD11 Scpiz6a_1 25749545 25750504 -
7 HOXD12 Scpiz6a_1 25758704 25760716 -
8 HOXD13 Scpiz6a_1 25770564 25772996 -
9 EVX2 Scpiz6a_1 25786550 25789410 +
10 EVX1 Scpiz6a_30.1 12112697 12119576 -
11 HOXA13 Scpiz6a_30.1 12199764 12201101 +
12 HOXA11 Scpiz6a_30.1 12219716 12221513 +
13 HOXA10 Scpiz6a_30.1 12233452 12236180 +
14 HOXA9 Scpiz6a_30.1 12243941 12246059 +
15 HOXA7 Scpiz6a_30.1 12254353 12256495 +
16 HOXA6 Scpiz6a_30.1 12263757 12265990 +
17 HOXA5 Scpiz6a_30.1 12267708 12269129 +
18 HOXA4 Scpiz6a_30.1 12282223 12283761 +
19 HOXA3 Scpiz6a_30.1 12303335 12306741 +
20 HOXA2 Scpiz6a_30.1 12312194 12314293 +
21 HOXA1 Scpiz6a_30.1 12319290 12323555 +
22 HOXC3 Scpiz6a_37 9403550 9404943 -
23 HOXC4 Scpiz6a_37 9474154 9474759 -
24 HOXC5 Scpiz6a_37 9499983 9501230 -
25 HOXC6 Scpiz6a_37 9504857 9506031 -
26 HOXC8 Scpiz6a_37 9526250 9528579 -
27 HOXC9 Scpiz6a_37 9535748 9538860 -
28 HOXC10 Scpiz6a_37 9550771 9554871 -
29 HOXC11 Scpiz6a_37 9562146 9568818 -
30 HOXC12 Scpiz6a_37 9589583 9594302 -
31 HOXC13 Scpiz6a_37 9606524 9610151 -
32 HOXB13 Scpiz6a_86 6098404 6105626 +
33 HOXB9 Scpiz6a_86 6207845 6211976 +
34 HOXB8 Scpiz6a_86 6217173 6222005 +
35 HOXB7 Scpiz6a_86 6226105 6232453 +
36 HOXB6 Scpiz6a_86 6243660 6246408 +
37 HOXB5 Scpiz6a_86 6249716 6263275 +
38 HOXB4 Scpiz6a_86 6272733 6275177 +
39 HOXB3 Scpiz6a_86 6308863 6317043 +
40 HOXB2 Scpiz6a_86 6328011 6330803 +
41 HOXB1 Scpiz6a_86 6356743 6360890 +

I then need to normalize each interval so that they start at 0.

In [103]:
for scaffold in hoxGeneCoordsDF.seqid.unique():
    min_start = hoxGeneCoordsDF[hoxGeneCoordsDF.seqid == scaffold].start.min()
    hoxGeneCoordsDF.loc[hoxGeneCoordsDF.seqid == scaffold, 'start'] = hoxGeneCoordsDF[hoxGeneCoordsDF.seqid == scaffold].start - min_start
    hoxGeneCoordsDF.loc[hoxGeneCoordsDF.seqid == scaffold, 'stop'] = hoxGeneCoordsDF[hoxGeneCoordsDF.seqid == scaffold].stop - min_start
hoxGeneCoordsDF
Out[103]:
gene seqid start stop strand
0 HOXD1 Scpiz6a_1 0 2187 -
1 HOXD3 Scpiz6a_1 33713 37706 -
2 HOXD4 Scpiz6a_1 62984 64152 -
3 HOXD8 Scpiz6a_1 92778 94908 -
4 HOXD9 Scpiz6a_1 100832 102607 -
5 HOXD10 Scpiz6a_1 103137 109957 -
6 HOXD11 Scpiz6a_1 120030 120989 -
7 HOXD12 Scpiz6a_1 129189 131201 -
8 HOXD13 Scpiz6a_1 141049 143481 -
9 EVX2 Scpiz6a_1 157035 159895 +
10 EVX1 Scpiz6a_30.1 0 6879 -
11 HOXA13 Scpiz6a_30.1 87067 88404 +
12 HOXA11 Scpiz6a_30.1 107019 108816 +
13 HOXA10 Scpiz6a_30.1 120755 123483 +
14 HOXA9 Scpiz6a_30.1 131244 133362 +
15 HOXA7 Scpiz6a_30.1 141656 143798 +
16 HOXA6 Scpiz6a_30.1 151060 153293 +
17 HOXA5 Scpiz6a_30.1 155011 156432 +
18 HOXA4 Scpiz6a_30.1 169526 171064 +
19 HOXA3 Scpiz6a_30.1 190638 194044 +
20 HOXA2 Scpiz6a_30.1 199497 201596 +
21 HOXA1 Scpiz6a_30.1 206593 210858 +
22 HOXC3 Scpiz6a_37 0 1393 -
23 HOXC4 Scpiz6a_37 70604 71209 -
24 HOXC5 Scpiz6a_37 96433 97680 -
25 HOXC6 Scpiz6a_37 101307 102481 -
26 HOXC8 Scpiz6a_37 122700 125029 -
27 HOXC9 Scpiz6a_37 132198 135310 -
28 HOXC10 Scpiz6a_37 147221 151321 -
29 HOXC11 Scpiz6a_37 158596 165268 -
30 HOXC12 Scpiz6a_37 186033 190752 -
31 HOXC13 Scpiz6a_37 202974 206601 -
32 HOXB13 Scpiz6a_86 0 7222 +
33 HOXB9 Scpiz6a_86 109441 113572 +
34 HOXB8 Scpiz6a_86 118769 123601 +
35 HOXB7 Scpiz6a_86 127701 134049 +
36 HOXB6 Scpiz6a_86 145256 148004 +
37 HOXB5 Scpiz6a_86 151312 164871 +
38 HOXB4 Scpiz6a_86 174329 176773 +
39 HOXB3 Scpiz6a_86 210459 218639 +
40 HOXB2 Scpiz6a_86 229607 232399 +
41 HOXB1 Scpiz6a_86 258339 262486 +
In [104]:
def coord_to_square(coord, level, color, width):
    start = coord[0]
    end = coord[1]
    polygon = PolygonPatch(
        Polygon(
            [(start, level - width),
             (start, level + width),
             (end, level + width),
             (end, level - width)]
        ), color=color
    )
    return polygon
In [105]:
x_buffer = 10000
num_clusters = len(hoxGeneCoordsDF.seqid.unique())


sns.set(font_scale=1.0, style='white')
fig=plt.figure(figsize=(6.4,3.5))
fig.subplots_adjust(hspace=0,wspace=0)
gs = gridspec.GridSpec(num_clusters, 1)
base_cluster_length = hoxGeneCoordsDF.stop.max() + x_buffer
for i,scaffold in enumerate(['Scpiz6a_30.1','Scpiz6a_86','Scpiz6a_37','Scpiz6a_1']):
    ax = plt.subplot(gs[i, :])
    ax.set_ylim(0,4)
    #base_cluster_length = hoxGeneCoordsDF[hoxGeneCoordsDF.seqid == scaffold].stop.max() + x_buffer
    ax.set_xlim(-x_buffer, base_cluster_length + x_buffer)
    cluster_patch = coord_to_square(coord = (0,base_cluster_length), level = 2, color='lightgrey', width=0.15)
    ax.add_patch(cluster_patch)
    counter = 0 
    for i, row in hoxGeneCoordsDF[hoxGeneCoordsDF.seqid == scaffold].iterrows():
        if row['strand'] == '+':
            color = 'firebrick'
        else:
            color = 'midnightblue'
        evidence_square = coord_to_square(coord = (row['start'], row['stop']), level=2, color=color, width=0.45)
        ax.add_patch(evidence_square)
        if counter % 2 == 0:
            text_div = 15.0
        else:
            text_div = 1.3
        
        ax.annotate(row['gene'], xy=(row['start'] + ((row['stop'] - row['start']) / text_div),
                                   4.0), ha='left', fontsize=6, rotation=40)
        counter+=1 
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    ax.spines['bottom'].set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.set_yticks([])
    xticks = ax.get_xticks()
    ax.set_xticks([])
ax.spines['bottom'].set_visible(True)
ax.set_xticks(np.arange(0,280000,20000))
x1kb_labels = [str(x/1000) for x in ax.get_xticks()]
ax.set_xticklabels(x1kb_labels,fontsize=minorFontSize,rotation=90)
ax.set_xlabel("scale (KB)")
fig.savefig('../fig2/SupplementalFigure3.pdf',bbox_inches='tight',pad_inches=0.1)
In [106]:
hoxGeneCoordsDF.seqid.unique()
['Scpiz6a_30.1','Scpiz6a_86','Scpiz6a_37','Scpiz6a_1']
Out[106]:
['Scpiz6a_30.1', 'Scpiz6a_86', 'Scpiz6a_37', 'Scpiz6a_1']

GATK

Running GATK

The following python command line tool was used organize and run the gatk pipeline on our computational server. The tool outputs shell scripts that can be run directly from the commandline. In hindsight making this tool was a bit of a waste of time, given that GATK has since provided their own framework for running GATK

In [ ]:
# %load ../bin/gatk6_pipeline.py
#!/usr/bin/env python
#Author: Duncan Tormey
#Email: dut@stowers.org or duncantormey@gmail.com

import os
import sys
import datetime
import argparse
from collections import defaultdict
sys.path = ["/home/dut/bin/python_scripts/"] + sys.path
import lims_classes as lims


def make_script_fh(base_name):
    date = datetime.datetime.today()
    date = str(date)
    date = date[:10]
    name = './%s_%s.sh' % (base_name, date)
    fh = open(name, "w")
    return fh, name


def make_output_dir(directory_name):
    working_directory = os.getcwd()
    output_directory = working_directory + "/" + directory_name
    if not os.path.exists(output_directory):
        os.makedirs(output_directory)
    return output_directory


def make_proc_capture(i):
    proc_capture = 'proc%s=$!\n' % str(i)
    proc = '\"$proc%s\"' % str(i)
    return proc_capture, proc


def prep_reference_commands(fasta_path,  picard_tools_path, write_prep_ref):
    output_dir = make_output_dir("reference")
    fasta_name = os.path.basename(fasta_path)
    fasta_ref_path = output_dir + "/" + fasta_name
    fasta_dict_path = output_dir + "/" + fasta_name.replace(".fa", ".dict")
    create_seq_dict = picard_tools_path + 'CreateSequenceDictionary.jar'
    sym_link = "ln -s %s %s/\n" % (fasta_path, output_dir)
    bwa_index = "bwa index -a bwtsw %s\n" % fasta_ref_path
    samtools_faidx = "samtools faidx %s\n" % fasta_ref_path
    sequence_dir = "java -jar %s REFERENCE=%s OUTPUT=%s\n" % (
        create_seq_dict, fasta_ref_path, fasta_dict_path)

    if write_prep_ref:
        fh, scriptname = make_script_fh("gatk_0_prep_ref")
        fh.write("#!/bin/bash\n")
        fh.write(sym_link)
        fh.write(bwa_index)
        fh.write(samtools_faidx)
        fh.write(sequence_dir)

        fh.close()

    return fasta_ref_path, scriptname


def ret_down_sample_read_pairs(pairs, sample, dwn_frac,outdir):
    #print(pairs)
    dwn_frac = float(dwn_frac)
    r_pair = pairs[0].path
    f_pair = pairs[1].path
    seed=sum([ord(a) for a in sample])
    pairs[0].path = outdir + '/' + os.path.basename(r_pair).replace('.fq.gz', '.%sds.fq'%str(dwn_frac)).replace('.fastq.gz', '.%sds.fastq'%str(dwn_frac))
    pairs[1].path = outdir + '/' + os.path.basename(f_pair).replace('.fq.gz', '.%sds.fq'%str(dwn_frac)).replace('.fastq.gz', '.%sds.fastq'%str(dwn_frac))
    dwn_sample_cmd = 'seqtk sample -s%s %s %s > %s; seqtk sample -s%s %s %s > %s &\n' % (seed, r_pair, dwn_frac, pairs[0].path,
                                                                                         seed, f_pair, dwn_frac, pairs[1].path)
    return dwn_sample_cmd, pairs

def down_sample_cmds(lane_pairs, down_sample, write, write_prefix, outdir):
    print('\n\n')
    print(lane_pairs)
    print('\n\n')
    outdir = make_output_dir(outdir)
    down_samples = dict(down_sample)  
    if write:
        fh, scriptname = make_script_fh(write_prefix)
        fh.write("#!/bin/bash\n")
    n = 1
    all_procs = []
    for sample in lane_pairs:
        if sample in down_samples:
            for  lane, pairs in lane_pairs[sample].items():
                print(sample)
                print(lane)
                print(pairs)
                cmd, pairs = ret_down_sample_read_pairs(pairs, sample, down_samples[sample], outdir)
                
                proc_capture, proc = make_proc_capture(n)
                all_procs.append(proc)
                if write:
                    fh.write(cmd)
                    fh.write(proc_capture)
                
                n+=1
                lane_pairs[sample][lane] = pairs
    if write:
        wait_command = "wait %s\n" % " ".join(all_procs)
        fh.write(wait_command)
        
    return lane_pairs, scriptname
                
def ret_bwa_align_sort_cmd(pairs, sample, outdir, cpus, fasta_ref_path):
    cpus = str(cpus)
    r_pair = pairs[0].path
    f_pair = pairs[1].path
    lane_id = pairs[0].lane_id
    read_group = pairs[0].read_group

    output_bam = '%s/%s.bam' % (outdir, lane_id)
    bwa_command = "bwa mem -M -R \"%s\" -t %s %s %s %s" % (
        read_group, cpus, fasta_ref_path, r_pair, f_pair)
    sam_to_sorted_bam = ('samtools view -Sb -@ %s - | '
                         'samtools sort -o -@ %s - %s > %s') % (
        cpus, cpus, lane_id, output_bam)

    align_sort_cmd = "(%s | %s) &\n" % (bwa_command, sam_to_sorted_bam)

    return align_sort_cmd, output_bam


def ret_dedup_cmd(sample_input,  sample, outdir, picard_tools_path):
    dedup = '%s/MarkDuplicates.jar' % picard_tools_path
    dedup_basename = os.path.basename(sample_input).replace('.bam', '')
    dedup_metric = '%s/%s_metrics.txt' % (outdir, dedup_basename)
    dedup_output = '%s/%s.dedup.bam' % (outdir, dedup_basename)
    dedup_command = "java -Xmx4g -jar %s I=%s O=%s M=%s &\n" % (
        dedup, sample_input, dedup_output, dedup_metric)

    return dedup_command, dedup_output


def ret_merge_bams_cmd(sample_input,  sample, outdir, cpus, tempdir):
    cpus = str(cpus)
    bam_files = ' '.join(sample_input)
    temp_file = '%s/%s.temp' % (tempdir, sample)
    out_bam = '%s/%s.merged.bam' % (outdir, sample)
    merge_cmd = ('(samtools merge -@ %s - %s | samtools sort '
                 '- -m 10G -@ %s -T %s -o %s; samtools index %s) &\n') % (
                     cpus, bam_files, cpus, temp_file, out_bam, out_bam)

    return merge_cmd, out_bam


def ret_index_cmd(sample_input, sample, outdir):
    index_cmd = 'samtools index -b %s &\n' % sample_input

    return index_cmd, sample_input


def ret_realign_indels_cmd(sample_input, sample, outdir, cpus, fasta_ref_path, gatk_path):
    realign_basename = os.path.basename(sample_input).replace('.bam', '')
    realign_list = '%s/%s_target_intervals.list' % (outdir, realign_basename)
    out_bam = '%s/%s.realigned.bam' % (outdir, realign_basename)
    targets_cmd = '%s -T RealignerTargetCreator -R %s -I %s -o %s -nt %s' % (
        gatk_path, fasta_ref_path, sample_input, realign_list, cpus)

    raln_cmd = '%s -T IndelRealigner -R %s -I %s -targetIntervals %s -o %s' % (
        gatk_path, fasta_ref_path, sample_input, realign_list, out_bam)

    target_realign_cmd = '(%s; %s) &\n' % (targets_cmd, raln_cmd)

    return target_realign_cmd, out_bam


def ret_haplotype_caller_cmd(sample_input, sample, outdir, fasta_ref_path,
                             gatk_path, gatk_args, i, param_string=None):
    out_vars = '%s/%s.%s_raw_var.vcf' % (outdir, sample, str(i))
    if param_string:
        cmd = '%s -T HaplotypeCaller %s -R %s -I %s %s -o %s' % (
            gatk_path, param_string, fasta_ref_path,
            sample_input, gatk_args, out_vars)
    else:
        cmd = '%s -T HaplotypeCaller -R %s -I %s %s -o %s' % (
            gatk_path, fasta_ref_path, sample_input, gatk_args, out_vars)

    return cmd, out_vars


def ret_extract_snps_cmd(input_vars, sample, gatk_path, fasta_ref_path,
                         outdir, i):
    out_snps = '%s/%s.%s_raw_snps.vcf' % (outdir, sample, str(i))
    cmd = '%s -T SelectVariants -R %s -V %s -selectType SNP -o %s' % (
        gatk_path, fasta_ref_path, input_vars, out_snps)

    return cmd, out_snps


def ret_filter_snps_cmd(input_snps, sample, gatk_path, snp_filter,
                        snp_filter_name, fasta_ref_path, outdir, i):
    out_snps = '%s/%s.%s_filt_snps.vcf' % (outdir, sample, str(i))
    cmd = '%s -T VariantFiltration -R %s -V %s --filterExpression \"%s\" --filterName \"%s\" -o %s' % (
        gatk_path, fasta_ref_path, input_snps, snp_filter, snp_filter_name, out_snps)

    return cmd, out_snps


def ret_extract_indels_cmd(input_vars, sample, gatk_path, fasta_ref_path,
                           outdir, i):
    out_indels = '%s/%s.%s_raw_indels.vcf' % (outdir, sample, str(i))
    cmd = '%s -T SelectVariants -R %s -V %s -selectType INDEL -o %s' % (
        gatk_path, fasta_ref_path, input_vars, out_indels)

    return cmd, out_indels


def ret_filter_indels_cmd(input_indels, sample, gatk_path, indel_filter,
                          indel_filter_name, fasta_ref_path, outdir, i):
    out_indels = '%s/%s.%s_filt_indels.vcf' % (outdir, sample, str(i))
    cmd = '%s -T VariantFiltration -R %s -V %s --filterExpression \"%s\" --filterName \"%s\" -o %s' % (
        gatk_path, fasta_ref_path, input_indels, indel_filter,
        indel_filter_name, out_indels)

    return cmd, out_indels


def ret_combine_snps_cmd(all_snps, gatk_path, fasta_ref_path, outdir, i):
    out_snps = '%s/all_snps_%s.vcf' % (outdir, str(i))
    cmd = '%s -R %s -T CombineVariants --variant %s -o %s --excludeNonVariants --minimumN 1' % (
        gatk_path, fasta_ref_path, ' --variant '.join(all_snps), out_snps)

    return cmd, out_snps


def ret_combine_indels_cmd(all_indels, gatk_path, fasta_ref_path, outdir, i):
    out_indels = '%s/all_indels_%s.vcf' % (outdir, str(i))
    cmd = '%s -R %s -T CombineVariants --variant %s -o %s --excludeNonVariants --minimumN 1' % (
        gatk_path, fasta_ref_path, ' --variant '.join(all_indels), out_indels)

    return cmd, out_indels


def ret_vcf_tools_snp_cmd(input_snps, snp_filter_name):
    out_snps = input_snps.replace('.vcf', '')

    cmd = 'vcftools --vcf %s --remove-filtered LowQual --remove-filtered %s --recode --recode-INFO-all --out %s' % (
        input_snps, snp_filter_name, out_snps)

    out_snps = out_snps + '.recode.vcf'
    return cmd, out_snps


def ret_vcf_tools_indel_cmd(input_indels, indel_filter_name):
    out_indels = input_indels.replace('.vcf', '')
    cmd = 'vcftools --vcf %s --remove-filtered LowQual --remove-filtered %s --recode --recode-INFO-all --out %s' % (
        input_indels, indel_filter_name, out_indels)

    out_indels = out_indels + '.recode.vcf'
    return cmd, out_indels


def ret_first_recal_cmd(input_indels, input_snps, sample_input, sample, outdir,
                        gatk_path, fasta_ref_path, i):
    before_table = '%s/%s_%s.before_table' % (outdir, sample, str(i))
    cmd = '%s -T BaseRecalibrator -R %s -I %s -knownSites %s -knownSites %s -o %s' % (
        gatk_path, fasta_ref_path, sample_input, input_indels, input_snps,
        before_table)
    return cmd, before_table


def ret_print_reads_cmd(sample_input, recal_table, sample, outdir, gatk_path,
                        fasta_ref_path, i):
    out_bam = os.path.basename(sample_input).replace(
        '.bam', '.recal_%s.bam' % str(i))
    out_bam = '%s/%s' % (outdir, out_bam)
    cmd = '%s -T PrintReads -R %s -I %s -BQSR %s -o %s' % (
        gatk_path, fasta_ref_path, sample_input, recal_table, out_bam)

    return cmd, out_bam


def ret_second_recal_cmd(input_indels, input_snps, sample_input, before_table,
                         sample, outdir, gatk_path, fasta_ref_path, i):
    after_table = '%s/%s_%s.after_table' % (outdir, sample, str(i))
    cmd = '%s -T BaseRecalibrator -R %s -I %s -knownSites %s -knownSites %s -BQSR %s -o %s' % (
        gatk_path, fasta_ref_path, sample_input, input_snps, input_indels,
        before_table, after_table)
    return cmd, after_table


def ret_recal_plot_cmd(before_table, after_table, sample, outdir, gatk_path,
                       fasta_ref_path, i):
    recal_plot = '%s/%s_%s.plots' % (outdir, sample, str(i))
    cmd = '%s -T AnalyzeCovariates -R %s -before %s -after %s -plots %s' % (
        gatk_path, fasta_ref_path, before_table, after_table, recal_plot)

    return cmd


def ret_genotype_gvcf_command(sample_inputs, fasta_ref_path, gatk_path, cpus, outdir):
    all_inputs = '--variant %s' % ' --variant '.join(
        [item for sublist in sample_inputs.values() for item in sublist])
    output_gvcf = '%s/jg_%s.gvcf' % (outdir, '_'.join(sample_inputs.keys()))
    cmd = '%s -T GenotypeGVCFs -nt %s -ploidy 2 -R %s %s -allSites -o %s\n' % (
        gatk_path, cpus, fasta_ref_path, all_inputs, output_gvcf)

    return cmd, output_gvcf


def ret_hc_call_cmds(sample_input, sample, fasta_ref_path, gatk_path, outdir, i, param_string=None):
    gatk_args = " -stand_call_conf 30 -stand_emit_conf 30 -mbq 17 "
    snp_filter = 'QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0'
    indel_filter = 'QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0'
    indel_filter_name = 'default_indel_filter'
    snp_filter_name = 'default_snp_filter'
    haplotype_caller_cmd, out_vars = ret_haplotype_caller_cmd(
        sample_input=sample_input, sample=sample, outdir=outdir,
        fasta_ref_path=fasta_ref_path, gatk_path=gatk_path,
        gatk_args=gatk_args, i=i, param_string=param_string)

    extract_snps_cmd, raw_snps = ret_extract_snps_cmd(
        input_vars=out_vars, sample=sample, gatk_path=gatk_path,
        fasta_ref_path=fasta_ref_path, outdir=outdir, i=i)

    filter_snps_cmd, out_snps = ret_filter_snps_cmd(
        input_snps=raw_snps, sample=sample, gatk_path=gatk_path,
        snp_filter=snp_filter, snp_filter_name=snp_filter_name,
        fasta_ref_path=fasta_ref_path, outdir=outdir, i=i)

    extract_indels_cmd, raw_indels = ret_extract_indels_cmd(
        input_vars=out_vars, sample=sample, gatk_path=gatk_path,
        fasta_ref_path=fasta_ref_path, outdir=outdir, i=i)

    filter_indels_cmd, out_indels = ret_filter_indels_cmd(
        input_indels=raw_indels, sample=sample, gatk_path=gatk_path,
        indel_filter=indel_filter, indel_filter_name=indel_filter_name,
        fasta_ref_path=fasta_ref_path, outdir=outdir, i=i)

    hc_commands = '(%s) &\n' % '; '.join(
        [haplotype_caller_cmd, extract_snps_cmd, filter_snps_cmd,
         extract_indels_cmd, filter_indels_cmd])

    return hc_commands, out_snps, out_indels


def ret_recode_indels_and_snps_cmd(all_snps, all_indels, snp_filter_name,
                                   indel_filter_name, gatk_path,
                                   fasta_ref_path, outdir, i):

    combine_snps_cmd, out_snps = ret_combine_snps_cmd(
        all_snps=all_snps, gatk_path=gatk_path, fasta_ref_path=fasta_ref_path,
        outdir=outdir, i=i)

    combine_indels_cmd, out_indels = ret_combine_indels_cmd(
        all_indels=all_indels, gatk_path=gatk_path,
        fasta_ref_path=fasta_ref_path, outdir=outdir, i=i)

    vcf_tools_snp_cmd, recode_snps = ret_vcf_tools_snp_cmd(
        input_snps=out_snps, snp_filter_name=snp_filter_name)

    vcf_tools_indel_cmd, recode_indels = ret_vcf_tools_indel_cmd(
        input_indels=out_indels, indel_filter_name=indel_filter_name)

    combine_recode_cmd = '%s\n' % '\n'.join(
        [combine_snps_cmd, combine_indels_cmd, vcf_tools_snp_cmd,
         vcf_tools_indel_cmd])

    return combine_recode_cmd, recode_snps, recode_indels


def ret_recalibration_cmds(sample_input, sample, outdir, input_snps,
                           input_indels, gatk_path, fasta_ref_path, i):
    first_recal_cmd, before_table = ret_first_recal_cmd(
        input_indels=input_indels, input_snps=input_snps, sample_input=sample_input,
        sample=sample, outdir=outdir, gatk_path=gatk_path,
        fasta_ref_path=fasta_ref_path, i=i)

    print_reads_cmd, out_bam = ret_print_reads_cmd(
        sample_input=sample_input, recal_table=before_table, sample=sample,
        outdir=outdir, gatk_path=gatk_path, fasta_ref_path=fasta_ref_path,
        i=i)

    second_recal_cmd, after_table = ret_second_recal_cmd(
        input_indels=input_indels, input_snps=input_snps, sample_input=sample_input,
        before_table=before_table, sample=sample, outdir=outdir,
        gatk_path=gatk_path, fasta_ref_path=fasta_ref_path, i=i)

    recal_plot_cmd = ret_recal_plot_cmd(
        before_table=before_table, after_table=after_table, sample=sample,
        outdir=outdir, gatk_path=gatk_path, fasta_ref_path=fasta_ref_path, i=i)

    recal_cmd = '(%s) &\n' % '; '.join(
        [first_recal_cmd, print_reads_cmd, second_recal_cmd, recal_plot_cmd])

    return recal_cmd, out_bam


def final_variant_calling(input_dict, fasta_ref_path, gatk_path, cpus, write,
                          write_prefix, hcgvcfs):
    if write:
        fh, scriptname = make_script_fh(write_prefix)
        fh.write("#!/bin/bash\n")
    gatk_args = " -stand_call_conf 30 -stand_emit_conf 30 -mbq 17 "
    param_string = '--variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF'
    hc_outdir = make_output_dir('final_variant_calling/hc_variant_calling')
    jg_outdir = make_output_dir('final_variant_calling/joint_genotypes')
    variant_dict = defaultdict(list)
    snp_filter = 'QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0'
    indel_filter = 'QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0'
    indel_filter_name = 'default_indel_filter'
    snp_filter_name = 'default_snp_filter'
    p = 0
    job_procs = []

    for sample in input_dict:
        cmd, out_var = ret_haplotype_caller_cmd(
            sample_input=input_dict[sample][0], sample=sample,
            outdir=hc_outdir, fasta_ref_path=fasta_ref_path,
            gatk_path=gatk_path, gatk_args=gatk_args,
            i='final', param_string=param_string)
        variant_dict[sample].append(out_var)
        proc_capture, proc = make_proc_capture(p)
        job_procs.append(proc)
        if write:
            fh.write(cmd + ' &\n')
            fh.write(proc_capture)
        p += 1

    wait_command = "wait %s\n" % " ".join(job_procs)
    job_procs = []
    
    if write:
        fh.write(wait_command)

    if hcgvcfs:
        for sample, gvcf in hcgvcfs:
            variant_dict[sample].append(gvcf)
        
    genotype_cmd, out_gvcf = ret_genotype_gvcf_command(
        sample_inputs=variant_dict, fasta_ref_path=fasta_ref_path,
        gatk_path=gatk_path, cpus=cpus, outdir=jg_outdir)

    if write:
        fh.write(genotype_cmd)
    sample = '_'.join(input_dict.keys())
    extract_snps_cmd, raw_snps = ret_extract_snps_cmd(
        input_vars=out_gvcf, sample=sample, gatk_path=gatk_path,
        fasta_ref_path=fasta_ref_path, outdir=jg_outdir, i='final')

    filter_snps_cmd, out_snps = ret_filter_snps_cmd(
        input_snps=raw_snps, sample=sample, gatk_path=gatk_path,
        snp_filter=snp_filter, snp_filter_name=snp_filter_name,
        fasta_ref_path=fasta_ref_path, outdir=jg_outdir, i='final')

    extract_indels_cmd, raw_indels = ret_extract_indels_cmd(
        input_vars=out_gvcf, sample=sample, gatk_path=gatk_path,
        fasta_ref_path=fasta_ref_path, outdir=jg_outdir, i='final')

    filter_indels_cmd, out_indels = ret_filter_indels_cmd(
        input_indels=raw_indels, sample=sample, gatk_path=gatk_path,
        indel_filter=indel_filter, indel_filter_name=indel_filter_name,
        fasta_ref_path=fasta_ref_path, outdir=jg_outdir, i='final')

    remove_nocall_cmd = 'cat %s | grep -v \'\./\.\'  > %s' % (
        out_gvcf, out_gvcf.replace('.gvcf', '.no_call_removed.gvcf'))
    badsnps_cmd = 'cat %s | grep \'%s\' | cut -f 1,2 > %s/bad_snps.tsv' % (
        out_snps, snp_filter_name, jg_outdir)
    badindels_cmd = 'cat %s   | grep \'%s\' | cut -f 1,2 > %s/bad_indels.tsv' % (
        out_indels, indel_filter_name, jg_outdir)
    bad_positions_cmd = 'cat %s/bad_indels.tsv %s/bad_snps.tsv > %s/bad_positions.tsv' % (
        jg_outdir, jg_outdir, jg_outdir)
    remove_bad_positions = 'vcftools --vcf %s --recode --out %s --exclude-positions %s/bad_positions.tsv' % (
        out_gvcf.replace('.gvcf', '.no_call_removed.gvcf'), out_gvcf.replace('.gvcf', '.no_call_removed.filt'), jg_outdir)

    if write:
        fh.write(extract_snps_cmd + '\n')
        fh.write(filter_snps_cmd + '\n')
        fh.write(extract_indels_cmd + '\n')
        fh.write(filter_indels_cmd + '\n')
        fh.write(remove_nocall_cmd + '\n')
        fh.write(badsnps_cmd + '\n')
        fh.write(badindels_cmd + '\n')
        fh.write(bad_positions_cmd + '\n')
        fh.write(remove_bad_positions + '\n')

        fh.close()

    return out_gvcf, scriptname


def training_commands(input_dict, rounds, fasta_ref_path, gatk_path, write,
                      write_prefix):
    variant_outdir = make_output_dir('training/variant_calling')
    recal_outdir = make_output_dir('training/recal_bams')
    if write:
        fh, scriptname = make_script_fh(write_prefix)
        fh.write('#!/bin/bash\n')
    job_procs = []
    p = 0
    print(rounds, range(rounds))
    for i in range(rounds):
        all_snps = []
        all_indels = []
        if i == 0:
            bam_dict = input_dict
        else:
            bam_dict = recal_bams
        for sample in bam_dict:
            hc_cmd, snps, indels = ret_hc_call_cmds(
                sample_input=bam_dict[sample][0], sample=sample,
                fasta_ref_path=fasta_ref_path, gatk_path=gatk_path,
                outdir=variant_outdir, i=i)

            all_snps.append(snps)
            all_indels.append(indels)
            proc_capture, proc = make_proc_capture(p)
            job_procs.append(proc)
            if write:
                fh.write(hc_cmd)
                fh.write(proc_capture)
            p += 1
        wait_command = "wait %s\n" % " ".join(job_procs)
        job_procs = []
        if write:
            fh.write(wait_command)

        recode_cmd, recode_snps, recode_indels = ret_recode_indels_and_snps_cmd(
            all_snps=all_snps, all_indels=all_indels,
            snp_filter_name='default_snp_filter',
            indel_filter_name='default_indel_filter', gatk_path=gatk_path,
            fasta_ref_path=fasta_ref_path, outdir=variant_outdir, i=i)

        if write:
            fh.write(recode_cmd)
        recal_bams = defaultdict(list)
        if i < rounds - 1:
            for sample in bam_dict:
                recal_cmd, recal_bam = ret_recalibration_cmds(
                    input_snps=recode_snps, input_indels=recode_indels,
                    sample_input=bam_dict[sample][0], sample=sample,
                    gatk_path=gatk_path, fasta_ref_path=fasta_ref_path,
                    outdir=recal_outdir, i=i)

                recal_bams[sample].append(recal_bam)
                proc_capture, proc = make_proc_capture(p)
                job_procs.append(proc)
                if write:
                    fh.write(recal_cmd)
                    fh.write(proc_capture)
                p += 1
            wait_command = "wait %s\n" % " ".join(job_procs)
            job_procs = []
            if write:
                fh.write(wait_command)
    fh.close()

    return recode_snps, recode_indels, scriptname


def parallel_cmd_loop(input_dict, nested, write, write_prefix,
                      cmd_function, outdir, **kwargs):
    out_dict = defaultdict(list)
    outdir = make_output_dir(outdir)
    if write:
        fh, scriptname = make_script_fh(write_prefix)
        fh.write("#!/bin/bash\n")
    n = 1
    all_procs = []
    job_procs = []
    out_dict = defaultdict(list)
    for sample in input_dict:
        if nested:
            try:
                nested_inputs = input_dict[sample].values()
            except AttributeError:
                nested_inputs = input_dict[sample]

            for nested_input in nested_inputs:
                cmd, output = cmd_function(
                    nested_input, sample, outdir, **kwargs)
                proc_capture, proc = make_proc_capture(n)
                if write:
                    fh.write(cmd)
                    fh.write(proc_capture)

                job_procs.append(proc)
                all_procs.append(proc)

                if n % 3 == 0:
                    wait_command = "wait %s\n" % " ".join(job_procs)
                    job_procs = []
                    if write:
                        fh.write(wait_command)

                out_dict[sample].append(output)
                # print(out_dict)
                n += 1
        else:
            # print(input_dict[sample])
            if type(input_dict[sample]) == list and len(input_dict[sample]) == 1:
                sample_input = input_dict[sample][0]
            else:
                sample_input = input_dict[sample]

            cmd, output = cmd_function(
                sample_input=sample_input, sample=sample, outdir=outdir, **kwargs)
            # print(output)
            proc_capture, proc = make_proc_capture(n)
            if write:
                fh.write(cmd)
                fh.write(proc_capture)

            job_procs.append(proc)
            all_procs.append(proc)

            if n % 3 == 0:
                wait_command = "wait %s\n" % " ".join(job_procs)
                job_procs = []
                if write:
                    fh.write(wait_command)

            out_dict[sample].append(output)

            n += 1

    if write:
        wait_command = "wait %s\n" % " ".join(all_procs)
        fh.write(wait_command)

    return out_dict, scriptname


def run(args):

    molng = lims.read_list_file(args.molngs)
    samples = lims.read_list_file(args.samples)
    flowcells = lims.read_list_file(args.flowcells)
    train = args.train
    
    fasta_path = args.fasta_path
    cpus = 8
    picard_tools_path = args.picard_path
    gatk = args.gatk_execute

    print(args.hcgvcfs)
    
    tigris_report = lims.Indexed_sample_reports(molng, samples)
    tigris_report.select_flowcells(flowcells)
    lane_pairs = tigris_report.ret_nested_pair_by_lane()
    scripts = []

    
    fasta_ref_path, s0 = prep_reference_commands(
        fasta_path, picard_tools_path, True)
    scripts.append(s0)

    if args.down_sample:
        print(args.down_sample)
        lane_pairs, dwn_s = down_sample_cmds(lane_pairs, args.down_sample, write=True,
                                            write_prefix='gatk_down_sample',outdir='preprocessing/down_sample')
        scripts.append(dwn_s)
        
    lane_bams, s1 = parallel_cmd_loop(
        input_dict=lane_pairs, nested=True, write=True,
        write_prefix='gatk_1_map_to_ref', cmd_function=ret_bwa_align_sort_cmd,
        outdir='preprocessing/map_to_ref_output', cpus=8,
        fasta_ref_path=fasta_ref_path)
    scripts.append(s1)
    
    dedup_lane_bams, s2 = parallel_cmd_loop(
        input_dict=lane_bams, nested=True, write=True,
        write_prefix='gatk_2_dedup_individual', cmd_function=ret_dedup_cmd,
        outdir='preprocessing/dedup_individuals',
        picard_tools_path=picard_tools_path)
    scripts.append(s2)
    
    sample_bams, s3 = parallel_cmd_loop(
        input_dict=dedup_lane_bams, nested=False, write=True,
        write_prefix='gatk_3_merge_bams', cmd_function=ret_merge_bams_cmd,
        outdir='preprocessing/merged_bams', cpus=8, tempdir='/scratch/dut')
    scripts.append(s3)
    
    deduped_sample_bams, s4 = parallel_cmd_loop(
        input_dict=sample_bams, nested=False, write=True,
        write_prefix='gatk_4_dedup_merged', cmd_function=ret_dedup_cmd,
        outdir='preprocessing/dedup_merged',
        picard_tools_path=picard_tools_path)
    scripts.append(s4)
    
    deduped_sample_bams, s5 = parallel_cmd_loop(
        input_dict=deduped_sample_bams, nested=False, write=True,
        write_prefix='gatk_5_index', cmd_function=ret_index_cmd,
        outdir='preprocessing/deduped_merged')
    scripts.append(s5)
    
    realigned_sample_bams, s6 = parallel_cmd_loop(
        input_dict=deduped_sample_bams, nested=False, write=True,
        write_prefix='gatk_6_realign_bams', cmd_function=ret_realign_indels_cmd,
        outdir='preprocessing/realigned_merged_bams', cpus=8,
        fasta_ref_path=fasta_ref_path, gatk_path=gatk)
    scripts.append(s6)
    
    if train:
        known_snps, known_indels, s7 = training_commands(
            input_dict=realigned_sample_bams, rounds=4, fasta_ref_path=fasta_ref_path,
            gatk_path=gatk, write=True, write_prefix='gatk_7_training')
        scripts.append(s7)
    elif args.known_snps and args.known_indels:
        known_snps = args.known_snps
        known_indels = args.known_indels
    else:
        print('train must be set to true or known snps and indels must be provieded')
        sys.exit(1)
        
    recalibrated_bams, s8 = parallel_cmd_loop(
        input_dict=realigned_sample_bams, nested=False, write=True,
        write_prefix='gatk_8_recalibrate_bams', outdir='final_variant_calling/recal_bams',
        cmd_function=ret_recalibration_cmds, input_snps=known_snps,
        input_indels=known_indels, gatk_path=gatk, fasta_ref_path=fasta_ref_path, i='final')
    scripts.append(s8)
    
    gvcf, s9 = final_variant_calling(input_dict=recalibrated_bams, fasta_ref_path=fasta_ref_path,
                                     gatk_path=gatk, cpus=24, write=True,
                                     write_prefix='gatk_9_final_variant_calling', hcgvcfs = args.hcgvcfs)
    scripts.append(s9)
    
    fh, _ = make_script_fh('master')

    fh.write('#!/bin/bash\n')
    
    commands = ['echo \"%s\"\ntime %s &> %s\n' % (
        x.replace('./', '').replace('.sh', ''),
        x,
        x.replace('./', '').replace('.sh', '.out'),
    ) for x in scripts]
    commands = ''.join(commands)
    fh.write(commands)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()

    parser.add_argument('-molngs', action='store', dest='molngs',
                        help='path to file with all molngs paths seperated by newlines')

    parser.add_argument('-samples', action='store', dest='samples',
                        help='path to file with all sample names seperated by newlines')

    parser.add_argument('-flowcells', action='store', dest='flowcells',
                        help='path to file with all flowcell names seperated by newlines')

    parser.add_argument ('-down_sample', nargs=2, action='append',
                         help='pass sample name and fraction to down sample to, e.g -down_sample Atig_122 0.33')
    
    parser.add_argument('-cpus', action='store', dest='cpus', default='8',
                        help='number of cpus to use per sample')

    parser.add_argument('-fasta_path', action='store', dest='fasta_path',
                        default='/home/dut/projects/tigris/genome_annotation/fasta/tigris_scaffolds_filt_10000.fa',
                        help='path to genome fasta')
    
    parser.add_argument('-picard_path', action='store', dest='picard_path',
                        default='/home/dut/bin/picard-tools-1.119/',
                        help='path to picard tools')

    parser.add_argument('-gatk_execute', action='store', dest='gatk_execute',
                        default='java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar',
                        help='gatk executable prefix')

    parser.add_argument('-train', action='store_true', dest='train',
                        default=False, help='train for known indels and snps')
    
    parser.add_argument('-known_snps', action='store', dest='known_snps',
                        help='path to known snps(required if training set to false)')

    parser.add_argument('-known_indels', action='store', dest='known_indels',
                        help='path to known indels(required if training set to false)')

    parser.add_argument('-add_hcgvcf',action='append',nargs=2, dest='hcgvcfs',
                        help='additional gvcf files produced by HaplotypeCaller to be used in joint genotypeing(-add_hcgvcf sample_name path_to_file')

    args = parser.parse_args()

    run(args)

This tool outputs a series of bash scripts, meant to be run sequentially. During the course of writting and running our samples throught the pipeline, I found mutiple errors and had to restart steps. Below I include the bash scripts I ended up running in sequential order to generat our final GVCFs.

GATK Run on 8450, 001 and 003

Here are all of the bash commands run for the GATK pipline. Some of the scripts are redundant because errors were found that merited the re-running of commands.

In [107]:
%%bash
for file in `ls -1 ../data/gatk6/bash/*.sh`
 do
  echo "------------------------------------------- ${file} -------------------------------------------"
  cat $file
done
------------------------------------------- ../data/gatk6/bash/gatk_0_prep_ref_2016-03-31.sh -------------------------------------------
#!/bin/bash
ln -s /home/dut/projects/tigris/genome_annotation/fasta/tigris_scaffolds_filt_10000.fa /home/dut/projects/tigris/heterozygosity/gatk6/reference/
bwa index -a bwtsw /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa
samtools faidx /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa
java -jar /home/dut/bin/picard-tools-1.119/CreateSequenceDictionary.jar REFERENCE=/home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa OUTPUT=/home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.dict
------------------------------------------- ../data/gatk6/bash/gatk_1_map_to_ref_2016-03-31.sh -------------------------------------------
#!/bin/bash
(bwa mem -M -R "@RG\tID:A_tigris8450_L13136_HF7YWADXXb_2\tSM:A_tigris8450\tPL:illumina\tLB:L13136\tPU:2" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_2_1_GTGGCC.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_2_2_GTGGCC.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - A_tigris8450_L13136_HF7YWADXXb_2 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136_HF7YWADXXb_2.bam) &
proc1=$!
(bwa mem -M -R "@RG\tID:A_tigris8450_L13136-1_HG7HJADXXb_2\tSM:A_tigris8450\tPL:illumina\tLB:L13136-1\tPU:2" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_2_1_GTGGCC.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_2_2_GTGGCC.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - A_tigris8450_L13136-1_HG7HJADXXb_2 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136-1_HG7HJADXXb_2.bam) &
proc2=$!
(bwa mem -M -R "@RG\tID:A_tigris8450_L13136-1_HG7HJADXXb_1\tSM:A_tigris8450\tPL:illumina\tLB:L13136-1\tPU:1" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_1_1_GTGGCC.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_1_2_GTGGCC.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - A_tigris8450_L13136-1_HG7HJADXXb_1 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136-1_HG7HJADXXb_1.bam) &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
(bwa mem -M -R "@RG\tID:A_tigris8450_L13136_HBCBWADXXb_2\tSM:A_tigris8450\tPL:illumina\tLB:L13136\tPU:2" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_2_1_GTGGCC.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_2_2_GTGGCC.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - A_tigris8450_L13136_HBCBWADXXb_2 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136_HBCBWADXXb_2.bam) &
proc4=$!
(bwa mem -M -R "@RG\tID:A_tigris8450_L13136_HBCBWADXXb_1\tSM:A_tigris8450\tPL:illumina\tLB:L13136\tPU:1" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_1_1_GTGGCC.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_1_2_GTGGCC.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - A_tigris8450_L13136_HBCBWADXXb_1 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136_HBCBWADXXb_1.bam) &
proc5=$!
(bwa mem -M -R "@RG\tID:A_tigris8450_L13136_HF7YWADXXb_1\tSM:A_tigris8450\tPL:illumina\tLB:L13136\tPU:1" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_1_1_GTGGCC.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_1_2_GTGGCC.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - A_tigris8450_L13136_HF7YWADXXb_1 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136_HF7YWADXXb_1.bam) &
proc6=$!
wait "$proc4" "$proc5" "$proc6"
(bwa mem -M -R "@RG\tID:Atig_122_L21676_HJ2YHBCXX_2\tSM:Atig_122\tPL:illumina\tLB:L21676\tPU:2" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-1575/HJ2YHBCXX/s_2_1_CTCAGA.fastq.gz /n/analysis/Baumann/rrs/MOLNG-1575/HJ2YHBCXX/s_2_2_CTCAGA.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig_122_L21676_HJ2YHBCXX_2 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig_122_L21676_HJ2YHBCXX_2.bam) &
proc7=$!
(bwa mem -M -R "@RG\tID:Atig_122_L21676_HJ2YHBCXX_1\tSM:Atig_122\tPL:illumina\tLB:L21676\tPU:1" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-1575/HJ2YHBCXX/s_1_1_CTCAGA.fastq.gz /n/analysis/Baumann/rrs/MOLNG-1575/HJ2YHBCXX/s_1_2_CTCAGA.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig_122_L21676_HJ2YHBCXX_1 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig_122_L21676_HJ2YHBCXX_1.bam) &
proc8=$!
(bwa mem -M -R "@RG\tID:Atig003_L13088_HF7YWADXXb_1\tSM:Atig003\tPL:illumina\tLB:L13088\tPU:1" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_1_1_TAGCTT.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_1_2_TAGCTT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig003_L13088_HF7YWADXXb_1 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088_HF7YWADXXb_1.bam) &
proc9=$!
wait "$proc7" "$proc8" "$proc9"
(bwa mem -M -R "@RG\tID:Atig003_L13088_HBCBWADXXb_1\tSM:Atig003\tPL:illumina\tLB:L13088\tPU:1" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_1_1_TAGCTT.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_1_2_TAGCTT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig003_L13088_HBCBWADXXb_1 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088_HBCBWADXXb_1.bam) &
proc10=$!
(bwa mem -M -R "@RG\tID:Atig003_L13088_HBCBWADXXb_2\tSM:Atig003\tPL:illumina\tLB:L13088\tPU:2" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_2_1_TAGCTT.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_2_2_TAGCTT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig003_L13088_HBCBWADXXb_2 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088_HBCBWADXXb_2.bam) &
proc11=$!
(bwa mem -M -R "@RG\tID:Atig003_L13088-1_HG7HJADXXb_1\tSM:Atig003\tPL:illumina\tLB:L13088-1\tPU:1" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_1_1_TAGCTT.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_1_2_TAGCTT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig003_L13088-1_HG7HJADXXb_1 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088-1_HG7HJADXXb_1.bam) &
proc12=$!
wait "$proc10" "$proc11" "$proc12"
(bwa mem -M -R "@RG\tID:Atig003_L13088-1_HG7HJADXXb_2\tSM:Atig003\tPL:illumina\tLB:L13088-1\tPU:2" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_2_1_TAGCTT.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_2_2_TAGCTT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig003_L13088-1_HG7HJADXXb_2 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088-1_HG7HJADXXb_2.bam) &
proc13=$!
(bwa mem -M -R "@RG\tID:Atig003_L13088_HF7YWADXXb_2\tSM:Atig003\tPL:illumina\tLB:L13088\tPU:2" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_2_1_TAGCTT.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_2_2_TAGCTT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig003_L13088_HF7YWADXXb_2 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088_HF7YWADXXb_2.bam) &
proc14=$!
(bwa mem -M -R "@RG\tID:Atig001_L13087_HG7HJADXXb_2\tSM:Atig001\tPL:illumina\tLB:L13087\tPU:2" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_2_1_ATCACG.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_2_2_ATCACG.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig001_L13087_HG7HJADXXb_2 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HG7HJADXXb_2.bam) &
proc15=$!
wait "$proc13" "$proc14" "$proc15"
(bwa mem -M -R "@RG\tID:Atig001_L13087_HG7HJADXXb_1\tSM:Atig001\tPL:illumina\tLB:L13087\tPU:1" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_1_1_ATCACG.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HG7HJADXXb/s_1_2_ATCACG.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig001_L13087_HG7HJADXXb_1 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HG7HJADXXb_1.bam) &
proc16=$!
(bwa mem -M -R "@RG\tID:Atig001_L13087_HF7YWADXXb_2\tSM:Atig001\tPL:illumina\tLB:L13087\tPU:2" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_2_1_ATCACG.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_2_2_ATCACG.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig001_L13087_HF7YWADXXb_2 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HF7YWADXXb_2.bam) &
proc17=$!
(bwa mem -M -R "@RG\tID:Atig001_L13087_HBCBWADXXb_2\tSM:Atig001\tPL:illumina\tLB:L13087\tPU:2" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_2_1_ATCACG.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_2_2_ATCACG.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig001_L13087_HBCBWADXXb_2 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HBCBWADXXb_2.bam) &
proc18=$!
wait "$proc16" "$proc17" "$proc18"
(bwa mem -M -R "@RG\tID:Atig001_L13087_HBCBWADXXb_1\tSM:Atig001\tPL:illumina\tLB:L13087\tPU:1" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_1_1_ATCACG.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HBCBWADXXb/s_1_2_ATCACG.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig001_L13087_HBCBWADXXb_1 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HBCBWADXXb_1.bam) &
proc19=$!
(bwa mem -M -R "@RG\tID:Atig001_L13087_HF7YWADXXb_1\tSM:Atig001\tPL:illumina\tLB:L13087\tPU:1" -t 8 /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_1_1_ATCACG.fastq.gz /n/analysis/Baumann/aan/MOLNG-1029/HF7YWADXXb/s_1_2_ATCACG.fastq.gz | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig001_L13087_HF7YWADXXb_1 > /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HF7YWADXXb_1.bam) &
proc20=$!
------------------------------------------- ../data/gatk6/bash/gatk_2_dedup_individual_2016-03-31.sh -------------------------------------------
#!/bin/bash
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136_HF7YWADXXb_2.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HF7YWADXXb_2.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HF7YWADXXb_2_metrics.txt &
proc1=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136-1_HG7HJADXXb_2.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136-1_HG7HJADXXb_2.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136-1_HG7HJADXXb_2_metrics.txt &
proc2=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136-1_HG7HJADXXb_1.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136-1_HG7HJADXXb_1.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136-1_HG7HJADXXb_1_metrics.txt &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136_HBCBWADXXb_2.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HBCBWADXXb_2.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HBCBWADXXb_2_metrics.txt &
proc4=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136_HBCBWADXXb_1.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HBCBWADXXb_1.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HBCBWADXXb_1_metrics.txt &
proc5=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/A_tigris8450_L13136_HF7YWADXXb_1.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HF7YWADXXb_1.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HF7YWADXXb_1_metrics.txt &
proc6=$!
wait "$proc4" "$proc5" "$proc6"
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig_122_L21676_HJ2YHBCXX_2.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_2.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_2_metrics.txt &
proc7=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig_122_L21676_HJ2YHBCXX_1.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_1.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_1_metrics.txt &
proc8=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088_HF7YWADXXb_1.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HF7YWADXXb_1.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HF7YWADXXb_1_metrics.txt &
proc9=$!
wait "$proc7" "$proc8" "$proc9"
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088_HBCBWADXXb_1.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HBCBWADXXb_1.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HBCBWADXXb_1_metrics.txt &
proc10=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088_HBCBWADXXb_2.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HBCBWADXXb_2.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HBCBWADXXb_2_metrics.txt &
proc11=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088-1_HG7HJADXXb_1.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088-1_HG7HJADXXb_1.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088-1_HG7HJADXXb_1_metrics.txt &
proc12=$!
wait "$proc10" "$proc11" "$proc12"
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088-1_HG7HJADXXb_2.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088-1_HG7HJADXXb_2.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088-1_HG7HJADXXb_2_metrics.txt &
proc13=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig003_L13088_HF7YWADXXb_2.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HF7YWADXXb_2.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HF7YWADXXb_2_metrics.txt &
proc14=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HG7HJADXXb_2.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HG7HJADXXb_2.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HG7HJADXXb_2_metrics.txt &
proc15=$!
wait "$proc13" "$proc14" "$proc15"
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HG7HJADXXb_1.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HG7HJADXXb_1.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HG7HJADXXb_1_metrics.txt &
proc16=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HF7YWADXXb_2.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HF7YWADXXb_2.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HF7YWADXXb_2_metrics.txt &
proc17=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HBCBWADXXb_2.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HBCBWADXXb_2.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HBCBWADXXb_2_metrics.txt &
proc18=$!
wait "$proc16" "$proc17" "$proc18"
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HBCBWADXXb_1.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HBCBWADXXb_1.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HBCBWADXXb_1_metrics.txt &
proc19=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/map_to_ref_output/Atig001_L13087_HF7YWADXXb_1.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HF7YWADXXb_1.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HF7YWADXXb_1_metrics.txt &
proc20=$!
------------------------------------------- ../data/gatk6/bash/gatk_3_merge_bams_2016-03-31.sh -------------------------------------------
#!/bin/bash
(samtools merge -@ 8 - /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HF7YWADXXb_2.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136-1_HG7HJADXXb_2.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136-1_HG7HJADXXb_1.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HBCBWADXXb_2.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HBCBWADXXb_1.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450_L13136_HF7YWADXXb_1.dedup.bam | samtools sort - -m 10G -@ 8 -T /scratch/dut/A_tigris8450.temp -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/A_tigris8450.merged.bam; samtools index /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/A_tigris8450.merged.bam) &
proc1=$!
(samtools merge -@ 8 - /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_2.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_1.dedup.bam | samtools sort - -m 10G -@ 8 -T /scratch/dut/Atig_122.temp -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/Atig_122.merged.bam; samtools index /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/Atig_122.merged.bam) &
proc2=$!
(samtools merge -@ 8 - /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HF7YWADXXb_1.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HBCBWADXXb_1.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HBCBWADXXb_2.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088-1_HG7HJADXXb_1.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088-1_HG7HJADXXb_2.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003_L13088_HF7YWADXXb_2.dedup.bam | samtools sort - -m 10G -@ 8 -T /scratch/dut/Atig003.temp -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/Atig003.merged.bam; samtools index /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/Atig003.merged.bam) &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
(samtools merge -@ 8 - /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HG7HJADXXb_2.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HG7HJADXXb_1.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HF7YWADXXb_2.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HBCBWADXXb_2.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HBCBWADXXb_1.dedup.bam /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001_L13087_HF7YWADXXb_1.dedup.bam | samtools sort - -m 10G -@ 8 -T /scratch/dut/Atig001.temp -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/Atig001.merged.bam; samtools index /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/Atig001.merged.bam) &
proc4=$!
------------------------------------------- ../data/gatk6/bash/gatk_4_dedup_merged_2016-03-31.sh -------------------------------------------
#!/bin/bash
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/A_tigris8450.merged.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450.merged.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/A_tigris8450.merged_metrics.txt &
proc1=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/Atig_122.merged.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig_122.merged.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig_122.merged_metrics.txt &
proc2=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/Atig003.merged.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003.merged.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig003.merged_metrics.txt &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/merged_bams/Atig001.merged.bam O=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001.merged.dedup.bam M=/home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_individuals/Atig001.merged_metrics.txt &
proc4=$!
wait "$proc1" "$proc2" "$proc3" "$proc4"
------------------------------------------- ../data/gatk6/bash/gatk_5_index_2016-04-04.sh -------------------------------------------
#!/bin/bash
samtools index -b /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/A_tigris8450.merged.dedup.bam &
proc1=$!
samtools index -b /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/Atig_122.merged.dedup.bam &
proc2=$!
samtools index -b /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/Atig003.merged.dedup.bam &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
samtools index -b /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/Atig001.merged.dedup.bam &
proc4=$!
wait "$proc1" "$proc2" "$proc3" "$proc4"
------------------------------------------- ../data/gatk6/bash/gatk_6_realign_bams_2016-04-04.sh -------------------------------------------
#!/bin/bash
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/A_tigris8450.merged.dedup.bam -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/A_tigris8450.merged.dedup_target_intervals.list -nt 8; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T IndelRealigner -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/A_tigris8450.merged.dedup.bam -targetIntervals /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/A_tigris8450.merged.dedup_target_intervals.list -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/A_tigris8450.merged.dedup.realigned.bam) &
proc1=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/Atig_122.merged.dedup.bam -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig_122.merged.dedup_target_intervals.list -nt 8; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T IndelRealigner -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/Atig_122.merged.dedup.bam -targetIntervals /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig_122.merged.dedup_target_intervals.list -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam) &
proc2=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/Atig003.merged.dedup.bam -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig003.merged.dedup_target_intervals.list -nt 8; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T IndelRealigner -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/Atig003.merged.dedup.bam -targetIntervals /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig003.merged.dedup_target_intervals.list -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig003.merged.dedup.realigned.bam) &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/Atig001.merged.dedup.bam -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig001.merged.dedup_target_intervals.list -nt 8; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T IndelRealigner -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/dedup_merged/Atig001.merged.dedup.bam -targetIntervals /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig001.merged.dedup_target_intervals.list -o /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig001.merged.dedup.realigned.bam) &
proc4=$!
wait "$proc1" "$proc2" "$proc3" "$proc4"
------------------------------------------- ../data/gatk6/bash/gatk_7_training_2016-04-21.sh -------------------------------------------
#!/bin/bash
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/A_tigris8450.merged.dedup.realigned.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.0_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.0_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.0_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.0_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.0_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.0_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.0_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.0_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.0_filt_indels.vcf) &
proc0=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.0_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.0_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.0_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.0_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.0_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.0_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.0_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.0_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.0_filt_indels.vcf) &
proc1=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig003.merged.dedup.realigned.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.0_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.0_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.0_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.0_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.0_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.0_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.0_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.0_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.0_filt_indels.vcf) &
proc2=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig001.merged.dedup.realigned.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.0_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.0_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.0_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.0_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.0_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.0_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.0_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.0_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.0_filt_indels.vcf) &
proc3=$!
wait "$proc0" "$proc1" "$proc2" "$proc3"
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.0_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.0_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.0_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.0_filt_snps.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_0.vcf --excludeNonVariants --minimumN 1
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.0_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.0_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.0_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.0_filt_indels.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_0.vcf --excludeNonVariants --minimumN 1
vcftools --vcf /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_0.vcf --remove-filtered LowQual --remove-filtered default_snp_filter --recode --recode-INFO-all --out /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_0
vcftools --vcf /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_0.vcf --remove-filtered LowQual --remove-filtered default_indel_filter --recode --recode-INFO-all --out /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_0
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/A_tigris8450.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_0.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_0.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_0.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/A_tigris8450.merged.dedup.realigned.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_0.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/A_tigris8450.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_0.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_0.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_0.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_0.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_0.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_0.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_0.plots) &
proc4=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_0.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_0.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_0.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_0.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_0.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_0.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_0.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_0.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_0.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_0.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_0.plots) &
proc5=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig003.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_0.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_0.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_0.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig003.merged.dedup.realigned.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_0.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig003.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_0.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_0.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_0.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_0.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_0.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_0.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_0.plots) &
proc6=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig001.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_0.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_0.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_0.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig001.merged.dedup.realigned.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_0.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig001.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_0.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_0.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_0.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_0.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_0.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_0.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_0.plots) &
proc7=$!
wait "$proc4" "$proc5" "$proc6" "$proc7"
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.1_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.1_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.1_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.1_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.1_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.1_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.1_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.1_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.1_filt_indels.vcf) &
proc8=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.1_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.1_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.1_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.1_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.1_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.1_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.1_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.1_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.1_filt_indels.vcf) &
proc9=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.1_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.1_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.1_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.1_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.1_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.1_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.1_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.1_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.1_filt_indels.vcf) &
proc10=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.1_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.1_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.1_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.1_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.1_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.1_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.1_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.1_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.1_filt_indels.vcf) &
proc11=$!
wait "$proc8" "$proc9" "$proc10" "$proc11"
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.1_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.1_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.1_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.1_filt_snps.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_1.vcf --excludeNonVariants --minimumN 1
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.1_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.1_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.1_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.1_filt_indels.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_1.vcf --excludeNonVariants --minimumN 1
vcftools --vcf /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_1.vcf --remove-filtered LowQual --remove-filtered default_snp_filter --recode --recode-INFO-all --out /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_1
vcftools --vcf /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_1.vcf --remove-filtered LowQual --remove-filtered default_indel_filter --recode --recode-INFO-all --out /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_1
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_1.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_1.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_1.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_1.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.recal_1.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_1.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_1.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_1.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_1.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_1.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_1.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_1.plots) &
proc12=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_1.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_1.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_1.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_1.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.recal_1.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_1.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_1.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_1.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_1.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_1.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_1.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_1.plots) &
proc13=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_1.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_1.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_1.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_1.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.recal_1.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_1.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_1.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_1.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_1.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_1.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_1.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_1.plots) &
proc14=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_1.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_1.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_1.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_1.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.recal_1.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_1.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_1.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_1.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_1.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_1.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_1.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_1.plots) &
proc15=$!
wait "$proc12" "$proc13" "$proc14" "$proc15"
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.recal_1.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.2_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.2_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.2_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.2_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.2_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.2_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.2_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.2_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.2_filt_indels.vcf) &
proc16=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.recal_1.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.2_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.2_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.2_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.2_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.2_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.2_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.2_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.2_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.2_filt_indels.vcf) &
proc17=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.recal_1.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.2_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.2_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.2_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.2_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.2_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.2_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.2_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.2_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.2_filt_indels.vcf) &
proc18=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.recal_1.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.2_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.2_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.2_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.2_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.2_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.2_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.2_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.2_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.2_filt_indels.vcf) &
proc19=$!
wait "$proc16" "$proc17" "$proc18" "$proc19"
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.2_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.2_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.2_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.2_filt_snps.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_2.vcf --excludeNonVariants --minimumN 1
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.2_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.2_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.2_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.2_filt_indels.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_2.vcf --excludeNonVariants --minimumN 1
vcftools --vcf /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_2.vcf --remove-filtered LowQual --remove-filtered default_snp_filter --recode --recode-INFO-all --out /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_2
vcftools --vcf /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_2.vcf --remove-filtered LowQual --remove-filtered default_indel_filter --recode --recode-INFO-all --out /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_2
------------------------------------------- ../data/gatk6/bash/gatk_7_training_2016-05-05.sh -------------------------------------------
#!/bin/bash
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_2.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_2.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_2.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.recal_1.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_2.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.recal_1.recal_2.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_2.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_2.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_2.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_2.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_2.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_2.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450_2.plots) &
proc20=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_2.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_2.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_2.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.recal_1.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_2.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.recal_1.recal_2.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_2.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_2.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_2.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_2.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_2.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_2.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122_2.plots) &
proc21=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_2.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_2.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_2.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.recal_1.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_2.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.recal_1.recal_2.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_2.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_2.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_2.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_2.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_2.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_2.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003_2.plots) &
proc22=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_2.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_2.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_2.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.recal_1.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_2.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.recal_1.recal_2.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_2.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_2.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_2.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_2.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_2.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_2.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001_2.plots) &
proc23=$!
wait "$proc20" "$proc21" "$proc22" "$proc23"
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/A_tigris8450.merged.dedup.realigned.recal_0.recal_1.recal_2.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.3_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.3_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.3_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.3_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.3_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.3_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.3_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.3_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.3_filt_indels.vcf) &
proc24=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig_122.merged.dedup.realigned.recal_0.recal_1.recal_2.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.3_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.3_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.3_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.3_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.3_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.3_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.3_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.3_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.3_filt_indels.vcf) &
proc25=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig003.merged.dedup.realigned.recal_0.recal_1.recal_2.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.3_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.3_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.3_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.3_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.3_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.3_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.3_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.3_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.3_filt_indels.vcf) &
proc26=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/training/recal_bams/Atig001.merged.dedup.realigned.recal_0.recal_1.recal_2.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.3_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.3_raw_var.vcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.3_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.3_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.3_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.3_raw_var.vcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.3_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.3_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.3_filt_indels.vcf) &
proc27=$!
wait "$proc24" "$proc25" "$proc26" "$proc27"
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.3_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.3_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.3_filt_snps.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.3_filt_snps.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_3.vcf --excludeNonVariants --minimumN 1
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/A_tigris8450.3_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig_122.3_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig003.3_filt_indels.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/Atig001.3_filt_indels.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_3.vcf --excludeNonVariants --minimumN 1
vcftools --vcf /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_3.vcf --remove-filtered LowQual --remove-filtered default_snp_filter --recode --recode-INFO-all --out /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_3
vcftools --vcf /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_3.vcf --remove-filtered LowQual --remove-filtered default_indel_filter --recode --recode-INFO-all --out /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_3
------------------------------------------- ../data/gatk6/bash/gatk_8_recalibrate_bams_2016-05-05.sh -------------------------------------------
#!/bin/bash
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/A_tigris8450.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_3.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_3.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/A_tigris8450_final.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/A_tigris8450.merged.dedup.realigned.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/A_tigris8450_final.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/A_tigris8450.merged.dedup.realigned.recal_final.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/A_tigris8450.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_3.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_3.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/A_tigris8450_final.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/A_tigris8450_final.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/A_tigris8450_final.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/A_tigris8450_final.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/A_tigris8450_final.plots) &
proc1=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_3.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_3.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig_122_final.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig_122_final.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig_122.merged.dedup.realigned.recal_final.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_3.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_3.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig_122_final.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig_122_final.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig_122_final.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig_122_final.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig_122_final.plots) &
proc2=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig003.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_3.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_3.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig003_final.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig003.merged.dedup.realigned.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig003_final.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig003.merged.dedup.realigned.recal_final.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig003.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_3.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_3.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig003_final.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig003_final.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig003_final.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig003_final.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig003_final.plots) &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig001.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_3.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_3.recode.vcf -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig001_final.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig001.merged.dedup.realigned.bam -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig001_final.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig001.merged.dedup.realigned.recal_final.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/preprocessing/realigned_merged_bams/Atig001.merged.dedup.realigned.bam -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_snps_3.recode.vcf -knownSites /home/dut/projects/tigris/heterozygosity/gatk6/training/variant_calling/all_indels_3.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig001_final.before_table -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig001_final.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig001_final.before_table -after /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig001_final.after_table -plots /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig001_final.plots) &
proc4=$!
wait "$proc1" "$proc2" "$proc3" "$proc4"
------------------------------------------- ../data/gatk6/bash/gatk_9_final_variant_calling_2016-05-17.sh -------------------------------------------
#!/bin/bash
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/A_tigris8450.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/A_tigris8450.final_raw_var.vcf &
#proc0=$!
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig_122.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig_122.final_raw_var.vcf &
#proc1=$!
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig003.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig003.final_raw_var.vcf &
#proc2=$!
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig001.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig001.final_raw_var.vcf &
#proc3=$!
#wait "$proc0" "$proc1" "$proc2" "$proc3"
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 24 -ploidy 2 -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa --variant /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/A_tigris8450.final_raw_var.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig_122.final_raw_var.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig003.final_raw_var.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig001.final_raw_var.vcf -allSites -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.gvcf
------------------------------------------- ../data/gatk6/bash/gatk_9_final_variant_calling_2016-05-23.sh -------------------------------------------
#!/bin/bash
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/A_tigris8450.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/A_tigris8450.final_raw_var.vcf &
#proc0=$!
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig_122.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig_122.final_raw_var.vcf &
#proc1=$!
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig003.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig003.final_raw_var.vcf &
#proc2=$!
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/recal_bams/Atig001.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig001.final_raw_var.vcf &
#proc3=$!
#wait "$proc0" "$proc1" "$proc2" "$proc3"
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 24 -ploidy 2 -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa --variant /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/A_tigris8450.final_raw_var.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig_122.final_raw_var.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig003.final_raw_var.vcf --variant /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/Atig001.final_raw_var.vcf -allSites -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.gvcf
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.gvcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/A_tigris8450_Atig_122_Atig003_Atig001.final_raw_snps.vcf
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/A_tigris8450_Atig_122_Atig003_Atig001.final_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/A_tigris8450_Atig_122_Atig003_Atig001.final_filt_snps.vcf
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.gvcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/A_tigris8450_Atig_122_Atig003_Atig001.final_raw_indels.vcf
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/gatk6/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/A_tigris8450_Atig_122_Atig003_Atig001.final_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/A_tigris8450_Atig_122_Atig003_Atig001.final_filt_indels.vcf
#cat /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.gvcf | grep -v '\./\.'  > /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.no_call_removed.gvcf
cat /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/A_tigris8450_Atig_122_Atig003_Atig001.final_filt_snps.vcf | grep 'default_snp_filter' | cut -f 1,2 > /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/bad_snps.tsv
cat /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/A_tigris8450_Atig_122_Atig003_Atig001.final_filt_indels.vcf | grep 'default_indel_filter' | cut -f 1,2 > /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/bad_indels.tsv
cat /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/bad_indels.tsv /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/bad_snps.tsv > /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/bad_positions.tsv
vcftools --vcf /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.no_call_removed.gvcf --recode --out /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.no_call_removed.filt --exclude-positions /home/dut/projects/tigris/heterozygosity/gatk6/final_variant_calling/hc_variant_calling/bad_positions.tsv
------------------------------------------- ../data/gatk6/bash/master_2016-04-21.sh -------------------------------------------
#!/bin/bash
echo "gatk_0_prep_ref_2016-04-21"
time ./gatk_0_prep_ref_2016-04-21.sh &> gatk_0_prep_ref_2016-04-21.out
echo "gatk_1_map_to_ref_2016-04-21"
time ./gatk_1_map_to_ref_2016-04-21.sh &> gatk_1_map_to_ref_2016-04-21.out
echo "gatk_2_dedup_individual_2016-04-21"
time ./gatk_2_dedup_individual_2016-04-21.sh &> gatk_2_dedup_individual_2016-04-21.out
echo "gatk_3_merge_bams_2016-04-21"
time ./gatk_3_merge_bams_2016-04-21.sh &> gatk_3_merge_bams_2016-04-21.out
echo "gatk_4_dedup_merged_2016-04-21"
time ./gatk_4_dedup_merged_2016-04-21.sh &> gatk_4_dedup_merged_2016-04-21.out
echo "gatk_5_index_2016-04-21"
time ./gatk_5_index_2016-04-21.sh &> gatk_5_index_2016-04-21.out
echo "gatk_6_realign_bams_2016-04-21"
time ./gatk_6_realign_bams_2016-04-21.sh &> gatk_6_realign_bams_2016-04-21.out
echo "gatk_7_training_2016-04-21"
time ./gatk_7_training_2016-04-21.sh &> gatk_7_training_2016-04-21.out
echo "gatk_8_recalibrate_bams_2016-04-21"
time ./gatk_8_recalibrate_bams_2016-04-21.sh &> gatk_8_recalibrate_bams_2016-04-21.out
echo "gatk_9_final_variant_calling_2016-04-21"
time ./gatk_9_final_variant_calling_2016-04-21.sh &> gatk_9_final_variant_calling_2016-04-21.out
------------------------------------------- ../data/gatk6/bash/master_2016-05-05.sh -------------------------------------------
#!/bin/bash
echo "gatk_0_prep_ref_2016-05-05"
time ./gatk_0_prep_ref_2016-05-05.sh &> gatk_0_prep_ref_2016-05-05.out
echo "gatk_1_map_to_ref_2016-05-05"
time ./gatk_1_map_to_ref_2016-05-05.sh &> gatk_1_map_to_ref_2016-05-05.out
echo "gatk_2_dedup_individual_2016-05-05"
time ./gatk_2_dedup_individual_2016-05-05.sh &> gatk_2_dedup_individual_2016-05-05.out
echo "gatk_3_merge_bams_2016-05-05"
time ./gatk_3_merge_bams_2016-05-05.sh &> gatk_3_merge_bams_2016-05-05.out
echo "gatk_4_dedup_merged_2016-05-05"
time ./gatk_4_dedup_merged_2016-05-05.sh &> gatk_4_dedup_merged_2016-05-05.out
echo "gatk_5_index_2016-05-05"
time ./gatk_5_index_2016-05-05.sh &> gatk_5_index_2016-05-05.out
echo "gatk_6_realign_bams_2016-05-05"
time ./gatk_6_realign_bams_2016-05-05.sh &> gatk_6_realign_bams_2016-05-05.out
echo "gatk_7_training_2016-05-05"
time ./gatk_7_training_2016-05-05.sh &> gatk_7_training_2016-05-05.out
echo "gatk_8_recalibrate_bams_2016-05-05"
time ./gatk_8_recalibrate_bams_2016-05-05.sh &> gatk_8_recalibrate_bams_2016-05-05.out
echo "gatk_9_final_variant_calling_2016-05-05"
time ./gatk_9_final_variant_calling_2016-05-05.sh &> gatk_9_final_variant_calling_2016-05-05.out

GATK Run on 122

Data for the mother was obtained later. We did not re-train, but instead ran the mother seperatately and then re-did the joint genotypeing steps. Here are the scripts for this run.

In [108]:
%%bash
for file in `ls -1 ../data/dwn_sample_atig_122/bash/*.sh`
 do
  echo "------------------------------------------- ${file} -------------------------------------------"
  cat $file
done
------------------------------------------- ../data/dwn_sample_atig_122/bash/gatk_0_prep_ref_2016-06-21.sh -------------------------------------------
#!/bin/bash
ln -s /home/dut/projects/tigris/genome_annotation/fasta/tigris_scaffolds_filt_10000.fa /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/
bwa index -a bwtsw /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa
samtools faidx /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa
java -jar /home/dut/bin/picard-tools-1.119/CreateSequenceDictionary.jar REFERENCE=/home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa OUTPUT=/home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.dict
------------------------------------------- ../data/dwn_sample_atig_122/bash/gatk_1_map_to_ref_2016-06-21.sh -------------------------------------------
#!/bin/bash
(bwa mem -M -R "@RG\tID:Atig_122_L21676_HJ2YHBCXX_2\tSM:Atig_122\tPL:illumina\tLB:L21676\tPU:2" -t 8 /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/down_sample/s_2_1_CTCAGA.0.33ds.fastq /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/down_sample/s_2_2_CTCAGA.0.33ds.fastq | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig_122_L21676_HJ2YHBCXX_2 > /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/map_to_ref_output/Atig_122_L21676_HJ2YHBCXX_2.bam) &
proc1=$!
(bwa mem -M -R "@RG\tID:Atig_122_L21676_HJ2YHBCXX_1\tSM:Atig_122\tPL:illumina\tLB:L21676\tPU:1" -t 8 /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/down_sample/s_1_1_CTCAGA.0.33ds.fastq /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/down_sample/s_1_2_CTCAGA.0.33ds.fastq | samtools view -Sb -@ 8 - | samtools sort -o -@ 8 - Atig_122_L21676_HJ2YHBCXX_1 > /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/map_to_ref_output/Atig_122_L21676_HJ2YHBCXX_1.bam) &
proc2=$!
wait "$proc1" "$proc2"
------------------------------------------- ../data/dwn_sample_atig_122/bash/gatk_2_dedup_individual_2016-06-21.sh -------------------------------------------
#!/bin/bash
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/map_to_ref_output/Atig_122_L21676_HJ2YHBCXX_2.bam O=/home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_2.dedup.bam M=/home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_2_metrics.txt &
proc1=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/map_to_ref_output/Atig_122_L21676_HJ2YHBCXX_1.bam O=/home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_1.dedup.bam M=/home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_1_metrics.txt &
proc2=$!
wait "$proc1" "$proc2"
------------------------------------------- ../data/dwn_sample_atig_122/bash/gatk_3_merge_bams_2016-06-21.sh -------------------------------------------
#!/bin/bash
(samtools merge -@ 8 - /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_2.dedup.bam /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/dedup_individuals/Atig_122_L21676_HJ2YHBCXX_1.dedup.bam | samtools sort - -m 10G -@ 8 -T /scratch/dut/Atig_122.temp -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/merged_bams/Atig_122.merged.bam; samtools index /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/merged_bams/Atig_122.merged.bam) &
proc1=$!
wait "$proc1"
------------------------------------------- ../data/dwn_sample_atig_122/bash/gatk_4_dedup_merged_2016-06-21.sh -------------------------------------------
#!/bin/bash
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/merged_bams/Atig_122.merged.bam O=/home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/dedup_merged/Atig_122.merged.dedup.bam M=/home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/dedup_merged/Atig_122.merged_metrics.txt &
proc1=$!
wait "$proc1"
------------------------------------------- ../data/dwn_sample_atig_122/bash/gatk_5_index_2016-06-21.sh -------------------------------------------
#!/bin/bash
samtools index -b /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/dedup_merged/Atig_122.merged.dedup.bam &
proc1=$!
wait "$proc1"
------------------------------------------- ../data/dwn_sample_atig_122/bash/gatk_6_realign_bams_2016-06-21.sh -------------------------------------------
#!/bin/bash
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/dedup_merged/Atig_122.merged.dedup.bam -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/realigned_merged_bams/Atig_122.merged.dedup_target_intervals.list -nt 8; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T IndelRealigner -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/dedup_merged/Atig_122.merged.dedup.bam -targetIntervals /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/realigned_merged_bams/Atig_122.merged.dedup_target_intervals.list -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam) &
proc1=$!
wait "$proc1"
------------------------------------------- ../data/dwn_sample_atig_122/bash/gatk_8_recalibrate_bams_2016-06-21.sh -------------------------------------------
#!/bin/bash
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam -knownSites ./../gatk6/training/variant_calling/all_indels_3.recode.vcf -knownSites ./../gatk6/training/variant_calling/all_snps_3.recode.vcf -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/recal_bams/Atig_122_final.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam -BQSR /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/recal_bams/Atig_122_final.before_table -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/recal_bams/Atig_122.merged.dedup.realigned.recal_final.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/realigned_merged_bams/Atig_122.merged.dedup.realigned.bam -knownSites ./../gatk6/training/variant_calling/all_snps_3.recode.vcf -knownSites ./../gatk6/training/variant_calling/all_indels_3.recode.vcf -BQSR /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/recal_bams/Atig_122_final.before_table -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/recal_bams/Atig_122_final.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa -before /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/recal_bams/Atig_122_final.before_table -after /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/recal_bams/Atig_122_final.after_table -plots /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/recal_bams/Atig_122_final.plots) &
proc1=$!
wait "$proc1"
------------------------------------------- ../data/dwn_sample_atig_122/bash/gatk_9_final_variant_calling_2016-06-21.sh -------------------------------------------
#!/bin/bash
#java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa -I /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/recal_bams/Atig_122.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/hc_variant_calling/Atig_122.final_raw_var.vcf &
#proc0=$!
#wait "$proc0"
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 24 -ploidy 2 -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa --variant ../gatk6/final_variant_calling/hc_variant_calling/A_tigris8450.final_raw_var.vcf --variant /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/hc_variant_calling/Atig_122.final_raw_var.vcf --variant ../gatk6/final_variant_calling/hc_variant_calling/Atig003.final_raw_var.vcf --variant ../gatk6/final_variant_calling/hc_variant_calling/Atig001.final_raw_var.vcf -allSites -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.gvcf
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.gvcf -selectType SNP -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/Atig_122.final_raw_snps.vcf
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/Atig_122.final_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/Atig_122.final_filt_snps.vcf
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.gvcf -selectType INDEL -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/Atig_122.final_raw_indels.vcf
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/reference/tigris_scaffolds_filt_10000.fa -V /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/Atig_122.final_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/Atig_122.final_filt_indels.vcf
cat /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.gvcf | grep -v '\./\.'  > /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.no_call_removed.gvcf
cat /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/Atig_122.final_filt_snps.vcf | grep 'default_snp_filter' | cut -f 1,2 > /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/bad_snps.tsv
cat /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/Atig_122.final_filt_indels.vcf   | grep 'default_indel_filter' | cut -f 1,2 > /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/bad_indels.tsv
cat /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/bad_indels.tsv /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/bad_snps.tsv > /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/bad_positions.tsv
vcftools --vcf /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.no_call_removed.gvcf --recode --out /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/jg_A_tigris8450_Atig_122_Atig003_Atig001.no_call_removed.filt --exclude-positions /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/final_variant_calling/joint_genotypes/bad_positions.tsv
------------------------------------------- ../data/dwn_sample_atig_122/bash/gatk_down_sample_2016-06-21.sh -------------------------------------------
#!/bin/bash
seqtk sample -s633 /n/analysis/Baumann/rrs/MOLNG-1575/HJ2YHBCXX/s_2_1_CTCAGA.fastq.gz 0.33 > /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/down_sample/s_2_1_CTCAGA.0.33ds.fastq; seqtk sample -s633 /n/analysis/Baumann/rrs/MOLNG-1575/HJ2YHBCXX/s_2_2_CTCAGA.fastq.gz 0.33 > /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/down_sample/s_2_2_CTCAGA.0.33ds.fastq &
proc1=$!
seqtk sample -s633 /n/analysis/Baumann/rrs/MOLNG-1575/HJ2YHBCXX/s_1_1_CTCAGA.fastq.gz 0.33 > /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/down_sample/s_1_1_CTCAGA.0.33ds.fastq; seqtk sample -s633 /n/analysis/Baumann/rrs/MOLNG-1575/HJ2YHBCXX/s_1_2_CTCAGA.fastq.gz 0.33 > /home/dut/projects/tigris/heterozygosity/dwn_sample_atig_122/preprocessing/down_sample/s_1_2_CTCAGA.0.33ds.fastq &
proc2=$!
wait "$proc1" "$proc2"
------------------------------------------- ../data/dwn_sample_atig_122/bash/master_2016-06-21.sh -------------------------------------------
#!/bin/bash
echo "gatk_0_prep_ref_2016-06-21"
time ./gatk_0_prep_ref_2016-06-21.sh &> gatk_0_prep_ref_2016-06-21.out
echo "gatk_down_sample_2016-06-21"
time ./gatk_down_sample_2016-06-21.sh &> gatk_down_sample_2016-06-21.out
echo "gatk_1_map_to_ref_2016-06-21"
time ./gatk_1_map_to_ref_2016-06-21.sh &> gatk_1_map_to_ref_2016-06-21.out
echo "gatk_2_dedup_individual_2016-06-21"
time ./gatk_2_dedup_individual_2016-06-21.sh &> gatk_2_dedup_individual_2016-06-21.out
echo "gatk_3_merge_bams_2016-06-21"
time ./gatk_3_merge_bams_2016-06-21.sh &> gatk_3_merge_bams_2016-06-21.out
echo "gatk_4_dedup_merged_2016-06-21"
time ./gatk_4_dedup_merged_2016-06-21.sh &> gatk_4_dedup_merged_2016-06-21.out
echo "gatk_5_index_2016-06-21"
time ./gatk_5_index_2016-06-21.sh &> gatk_5_index_2016-06-21.out
echo "gatk_6_realign_bams_2016-06-21"
time ./gatk_6_realign_bams_2016-06-21.sh &> gatk_6_realign_bams_2016-06-21.out
echo "gatk_8_recalibrate_bams_2016-06-21"
time ./gatk_8_recalibrate_bams_2016-06-21.sh &> gatk_8_recalibrate_bams_2016-06-21.out
echo "gatk_9_final_variant_calling_2016-06-21"
time ./gatk_9_final_variant_calling_2016-06-21.sh &> gatk_9_final_variant_calling_2016-06-21.out

GATK Initial Results Figure (Generated by Morgan Weichert)

These figures were generated by Morgan Weichert. They summarize the heterozygous sites detected by GATK with a few hard filters applied.

In [109]:
%%bash
cp /home/msr/Projects/marmorata_genome/gatk_analysis_A_tigris8450_Atig_122_Atig003_Atig001/fig/top5_8-36x_het_sites_partheno_mom.papersize.png.pdf ../fig2/SupplementalFigure4.pdf
In [110]:
cp /home/msr/Projects/marmorata_genome/gatk_analysis_A_tigris8450_Atig_122_Atig003_Atig001/fig/top5_8-36x_het_sites_partheno_mom.png ../fig/initial_gatk_results.png
In [111]:
Image('/home/msr/Projects/marmorata_genome/gatk_analysis_A_tigris8450_Atig_122_Atig003_Atig001/fig/top5_8-36x_het_sites_partheno_mom.png', height=600, width=600)
Out[111]:

Partial GATK pipeline Run on Additional Animals

In [112]:
molng2139 = pd.read_csv('/n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/Sample_Report.csv')
molng2140 = pd.read_csv('/n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/Sample_Report.csv')
In [113]:
newAnimalSequencingDF = pd.concat([molng2139, molng2140])
In [114]:
newAnimalSequencingDF['estimated_coverage'] = (newAnimalSequencingDF['TotalReads'] * newAnimalSequencingDF['ReadLength']) / 1633406540.0
newAnimalSequencingDF.groupby('SampleName').sum().reset_index()[['SampleName', 'estimated_coverage']]
Out[114]:
SampleName estimated_coverage
0 A.tig_12512 18.999300
1 A.tig_12513 19.207432
2 A.tig_9721 18.722871
3 Atig_4278 19.743835
4 Atig_6993 18.510542
5 Atig_9177 20.961326
In [115]:
%%bash
more ../data/gatk_MOLNG-2139_MOLNG-2140/*txt | cat
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/flowcells.txt
::::::::::::::
H2LVWBCX2
H2JNNBCX2
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/gatk_command.txt
::::::::::::::
./../../bin/gatk6_pipeline.py -molngs molng.txt -samples samples.txt -flowcells flowcells.txt -new_sample_report -train
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/molng.txt
::::::::::::::
/n/analysis/Baumann/rrs/MOLNG-2139/
/n/analysis/Baumann/rrs/MOLNG-2140/
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/samples.txt
::::::::::::::
Atig_4278
Atig_6993
Atig_9177
A.tig_12512
A.tig_12513
A.tig_9721
In [116]:
%%bash
more ../data/gatk_MOLNG-2139_MOLNG-2140/*sh | cat
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/gatk_0_prep_ref_2017-11-10.sh
::::::::::::::
#!/bin/bash
ln -s /home/dut/projects/tigris/genome_annotation/fasta/tigris_scaffolds_filt_10000.fa /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/
bwa index -a bwtsw /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa
samtools faidx /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa
java -jar /home/dut/bin/picard-tools-1.119/CreateSequenceDictionary.jar REFERENCE=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa OUTPUT=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.dict
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/gatk_1_map_to_ref_2017-11-10.sh
::::::::::::::
#!/bin/bash
(bwa mem -M -R "@RG\tID:Atig_4278_L30700_H2LVWBCX2_1\tSM:Atig_4278\tPL:illumina\tLB:L30700\tPU:1" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_1_1_GCGCTA.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_1_2_GCGCTA.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_4278_L30700_H2LVWBCX2_1.bam) &
proc1=$!
(bwa mem -M -R "@RG\tID:Atig_4278_L30700_H2LVWBCX2_2\tSM:Atig_4278\tPL:illumina\tLB:L30700\tPU:2" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_2_1_GCGCTA.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_2_2_GCGCTA.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_4278_L30700_H2LVWBCX2_2.bam) &
proc2=$!
(bwa mem -M -R "@RG\tID:A.tig_12512_L30701_H2JNNBCX2_1\tSM:A.tig_12512\tPL:illumina\tLB:L30701\tPU:1" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_1_1_ATTCCT.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_1_2_ATTCCT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_12512_L30701_H2JNNBCX2_1.bam) &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
(bwa mem -M -R "@RG\tID:A.tig_12512_L30701_H2JNNBCX2_2\tSM:A.tig_12512\tPL:illumina\tLB:L30701\tPU:2" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_2_1_ATTCCT.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_2_2_ATTCCT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_12512_L30701_H2JNNBCX2_2.bam) &
proc4=$!
(bwa mem -M -R "@RG\tID:A.tig_12513_L30702_H2JNNBCX2_1\tSM:A.tig_12513\tPL:illumina\tLB:L30702\tPU:1" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_1_1_CGGAAT.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_1_2_CGGAAT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_12513_L30702_H2JNNBCX2_1.bam) &
proc5=$!
(bwa mem -M -R "@RG\tID:A.tig_12513_L30702_H2JNNBCX2_2\tSM:A.tig_12513\tPL:illumina\tLB:L30702\tPU:2" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_2_1_CGGAAT.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_2_2_CGGAAT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_12513_L30702_H2JNNBCX2_2.bam) &
proc6=$!
wait "$proc4" "$proc5" "$proc6"
(bwa mem -M -R "@RG\tID:A.tig_9721_L30703_H2JNNBCX2_1\tSM:A.tig_9721\tPL:illumina\tLB:L30703\tPU:1" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_1_1_TCATTC.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_1_2_TCATTC.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_9721_L30703_H2JNNBCX2_1.bam) &
proc7=$!
(bwa mem -M -R "@RG\tID:A.tig_9721_L30703_H2JNNBCX2_2\tSM:A.tig_9721\tPL:illumina\tLB:L30703\tPU:2" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_2_1_TCATTC.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2140/H2JNNBCX2/s_2_2_TCATTC.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_9721_L30703_H2JNNBCX2_2.bam) &
proc8=$!
(bwa mem -M -R "@RG\tID:Atig_6993_L30699_H2LVWBCX2_2\tSM:Atig_6993\tPL:illumina\tLB:L30699\tPU:2" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_2_1_CACTCA.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_2_2_CACTCA.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_6993_L30699_H2LVWBCX2_2.bam) &
proc9=$!
wait "$proc7" "$proc8" "$proc9"
(bwa mem -M -R "@RG\tID:Atig_6993_L30699_H2LVWBCX2_1\tSM:Atig_6993\tPL:illumina\tLB:L30699\tPU:1" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_1_1_CACTCA.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_1_2_CACTCA.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_6993_L30699_H2LVWBCX2_1.bam) &
proc10=$!
(bwa mem -M -R "@RG\tID:Atig_9177_L30698_H2LVWBCX2_2\tSM:Atig_9177\tPL:illumina\tLB:L30698\tPU:2" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_2_1_ACTGAT.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_2_2_ACTGAT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_9177_L30698_H2LVWBCX2_2.bam) &
proc11=$!
(bwa mem -M -R "@RG\tID:Atig_9177_L30698_H2LVWBCX2_1\tSM:Atig_9177\tPL:illumina\tLB:L30698\tPU:1" -t 8 /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_1_1_ACTGAT.fastq.gz /n/analysis/Baumann/rrs/MOLNG-2139/H2LVWBCX2/s_1_2_ACTGAT.fastq.gz | samtools view -Sb -@ 8 - | samtools sort - -@ 8 -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_9177_L30698_H2LVWBCX2_1.bam) &
proc12=$!
wait "$proc10" "$proc11" "$proc12"
wait "$proc1" "$proc2" "$proc3" "$proc4" "$proc5" "$proc6" "$proc7" "$proc8" "$proc9" "$proc10" "$proc11" "$proc12"
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/gatk_2_dedup_individual_2017-11-10.sh
::::::::::::::
#!/bin/bash
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_4278_L30700_H2LVWBCX2_1.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_4278_L30700_H2LVWBCX2_1.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_4278_L30700_H2LVWBCX2_1_metrics.txt &
proc1=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_4278_L30700_H2LVWBCX2_2.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_4278_L30700_H2LVWBCX2_2.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_4278_L30700_H2LVWBCX2_2_metrics.txt &
proc2=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_12512_L30701_H2JNNBCX2_1.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12512_L30701_H2JNNBCX2_1.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12512_L30701_H2JNNBCX2_1_metrics.txt &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_12512_L30701_H2JNNBCX2_2.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12512_L30701_H2JNNBCX2_2.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12512_L30701_H2JNNBCX2_2_metrics.txt &
proc4=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_12513_L30702_H2JNNBCX2_1.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12513_L30702_H2JNNBCX2_1.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12513_L30702_H2JNNBCX2_1_metrics.txt &
proc5=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_12513_L30702_H2JNNBCX2_2.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12513_L30702_H2JNNBCX2_2.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12513_L30702_H2JNNBCX2_2_metrics.txt &
proc6=$!
wait "$proc4" "$proc5" "$proc6"
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_9721_L30703_H2JNNBCX2_1.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_9721_L30703_H2JNNBCX2_1.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_9721_L30703_H2JNNBCX2_1_metrics.txt &
proc7=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/A.tig_9721_L30703_H2JNNBCX2_2.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_9721_L30703_H2JNNBCX2_2.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_9721_L30703_H2JNNBCX2_2_metrics.txt &
proc8=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_6993_L30699_H2LVWBCX2_2.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_6993_L30699_H2LVWBCX2_2.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_6993_L30699_H2LVWBCX2_2_metrics.txt &
proc9=$!
wait "$proc7" "$proc8" "$proc9"
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_6993_L30699_H2LVWBCX2_1.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_6993_L30699_H2LVWBCX2_1.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_6993_L30699_H2LVWBCX2_1_metrics.txt &
proc10=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_9177_L30698_H2LVWBCX2_2.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_9177_L30698_H2LVWBCX2_2.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_9177_L30698_H2LVWBCX2_2_metrics.txt &
proc11=$!
java -Xmx4g -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/map_to_ref_output/Atig_9177_L30698_H2LVWBCX2_1.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_9177_L30698_H2LVWBCX2_1.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_9177_L30698_H2LVWBCX2_1_metrics.txt &
proc12=$!
wait "$proc10" "$proc11" "$proc12"
wait "$proc1" "$proc2" "$proc3" "$proc4" "$proc5" "$proc6" "$proc7" "$proc8" "$proc9" "$proc10" "$proc11" "$proc12"
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/gatk_3_merge_bams_2017-11-10.sh
::::::::::::::
#!/bin/bash
(samtools merge -@ 8 - /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_4278_L30700_H2LVWBCX2_1.dedup.bam /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_4278_L30700_H2LVWBCX2_2.dedup.bam | samtools sort - -m 10G -@ 8 -T /scratch/dut/Atig_4278.temp -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/Atig_4278.merged.bam; samtools index /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/Atig_4278.merged.bam) &
proc1=$!
(samtools merge -@ 8 - /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12512_L30701_H2JNNBCX2_1.dedup.bam /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12512_L30701_H2JNNBCX2_2.dedup.bam | samtools sort - -m 10G -@ 8 -T /scratch/dut/A.tig_12512.temp -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/A.tig_12512.merged.bam; samtools index /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/A.tig_12512.merged.bam) &
proc2=$!
(samtools merge -@ 8 - /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12513_L30702_H2JNNBCX2_1.dedup.bam /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_12513_L30702_H2JNNBCX2_2.dedup.bam | samtools sort - -m 10G -@ 8 -T /scratch/dut/A.tig_12513.temp -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/A.tig_12513.merged.bam; samtools index /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/A.tig_12513.merged.bam) &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
(samtools merge -@ 8 - /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_9721_L30703_H2JNNBCX2_1.dedup.bam /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/A.tig_9721_L30703_H2JNNBCX2_2.dedup.bam | samtools sort - -m 10G -@ 8 -T /scratch/dut/A.tig_9721.temp -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/A.tig_9721.merged.bam; samtools index /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/A.tig_9721.merged.bam) &
proc4=$!
(samtools merge -@ 8 - /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_6993_L30699_H2LVWBCX2_2.dedup.bam /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_6993_L30699_H2LVWBCX2_1.dedup.bam | samtools sort - -m 10G -@ 8 -T /scratch/dut/Atig_6993.temp -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/Atig_6993.merged.bam; samtools index /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/Atig_6993.merged.bam) &
proc5=$!
(samtools merge -@ 8 - /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_9177_L30698_H2LVWBCX2_2.dedup.bam /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_individuals/Atig_9177_L30698_H2LVWBCX2_1.dedup.bam | samtools sort - -m 10G -@ 8 -T /scratch/dut/Atig_9177.temp -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/Atig_9177.merged.bam; samtools index /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/Atig_9177.merged.bam) &
proc6=$!
wait "$proc4" "$proc5" "$proc6"
wait "$proc1" "$proc2" "$proc3" "$proc4" "$proc5" "$proc6"
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/gatk_4_dedup_merged_2017-11-10.sh
::::::::::::::
#!/bin/bash
java -Xmx4g -Djava.io.tmpdir=/scratch/dut -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/Atig_4278.merged.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_4278.merged.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_4278.merged_metrics.txt &
proc1=$!
java -Xmx4g -Djava.io.tmpdir=/scratch/dut  -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/A.tig_12512.merged.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_12512.merged.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_12512.merged_metrics.txt &
proc2=$!
java -Xmx4g -Djava.io.tmpdir=/scratch/dut -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/A.tig_12513.merged.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_12513.merged.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_12513.merged_metrics.txt &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
java -Xmx4g -Djava.io.tmpdir=/scratch/dut -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/A.tig_9721.merged.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_9721.merged.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_9721.merged_metrics.txt &
proc4=$!
java -Xmx4g -Djava.io.tmpdir=/scratch/dut -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/Atig_6993.merged.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_6993.merged.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_6993.merged_metrics.txt &
proc5=$!
java -Xmx4g -Djava.io.tmpdir=/scratch/dut -jar /home/dut/bin/picard-tools-1.119//MarkDuplicates.jar I=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/merged_bams/Atig_9177.merged.bam O=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_9177.merged.dedup.bam M=/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_9177.merged_metrics.txt &
proc6=$!
wait "$proc4" "$proc5" "$proc6"
wait "$proc1" "$proc2" "$proc3" "$proc4" "$proc5" "$proc6"
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/gatk_5_index_2017-11-10.sh
::::::::::::::
#!/bin/bash
samtools index -b /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_4278.merged.dedup.bam &
proc1=$!
samtools index -b /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_12512.merged.dedup.bam &
proc2=$!
samtools index -b /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_12513.merged.dedup.bam &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
samtools index -b /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_9721.merged.dedup.bam &
proc4=$!
samtools index -b /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_6993.merged.dedup.bam &
proc5=$!
samtools index -b /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_9177.merged.dedup.bam &
proc6=$!
wait "$proc4" "$proc5" "$proc6"
wait "$proc1" "$proc2" "$proc3" "$proc4" "$proc5" "$proc6"
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/gatk_6_realign_bams_2017-11-10.sh
::::::::::::::
#!/bin/bash
(java -Xmx3g -Djava.io.tmpdir=/scratch/dut -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_4278.merged.dedup.bam -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_4278.merged.dedup_target_intervals.list -nt 8; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T IndelRealigner -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_4278.merged.dedup.bam -targetIntervals /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_4278.merged.dedup_target_intervals.list -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_4278.merged.dedup.realigned.bam) &
proc1=$!
(java -Xmx3g -Djava.io.tmpdir=/scratch/dut -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_12512.merged.dedup.bam -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12512.merged.dedup_target_intervals.list -nt 8; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T IndelRealigner -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_12512.merged.dedup.bam -targetIntervals /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12512.merged.dedup_target_intervals.list -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12512.merged.dedup.realigned.bam) &
proc2=$!
(java -Xmx3g -Djava.io.tmpdir=/scratch/dut -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_12513.merged.dedup.bam -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12513.merged.dedup_target_intervals.list -nt 8; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T IndelRealigner -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_12513.merged.dedup.bam -targetIntervals /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12513.merged.dedup_target_intervals.list -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12513.merged.dedup.realigned.bam) &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
(java -Xmx3g -Djava.io.tmpdir=/scratch/dut -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_9721.merged.dedup.bam -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_9721.merged.dedup_target_intervals.list -nt 8; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T IndelRealigner -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/A.tig_9721.merged.dedup.bam -targetIntervals /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_9721.merged.dedup_target_intervals.list -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_9721.merged.dedup.realigned.bam) &
proc4=$!
(java -Xmx3g -Djava.io.tmpdir=/scratch/dut -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_6993.merged.dedup.bam -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_6993.merged.dedup_target_intervals.list -nt 8; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T IndelRealigner -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_6993.merged.dedup.bam -targetIntervals /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_6993.merged.dedup_target_intervals.list -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_6993.merged.dedup.realigned.bam) &
proc5=$!
(java -Xmx3g -Djava.io.tmpdir=/scratch/dut -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_9177.merged.dedup.bam -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_9177.merged.dedup_target_intervals.list -nt 8; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T IndelRealigner -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/dedup_merged/Atig_9177.merged.dedup.bam -targetIntervals /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_9177.merged.dedup_target_intervals.list -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_9177.merged.dedup.realigned.bam) &
proc6=$!
wait "$proc4" "$proc5" "$proc6"
wait "$proc1" "$proc2" "$proc3" "$proc4" "$proc5" "$proc6"
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/gatk_7_training_2017-11-10.sh
::::::::::::::
#!/bin/bash
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_4278.merged.dedup.realigned.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.0_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.0_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.0_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.0_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.0_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.0_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.0_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.0_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.0_filt_indels.vcf) &
proc0=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12512.merged.dedup.realigned.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.0_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.0_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.0_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.0_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.0_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.0_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.0_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.0_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.0_filt_indels.vcf) &
proc1=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12513.merged.dedup.realigned.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.0_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.0_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.0_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.0_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.0_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.0_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.0_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.0_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.0_filt_indels.vcf) &
proc2=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_9721.merged.dedup.realigned.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.0_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.0_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.0_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.0_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.0_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.0_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.0_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.0_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.0_filt_indels.vcf) &
proc3=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_6993.merged.dedup.realigned.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.0_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.0_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.0_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.0_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.0_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.0_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.0_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.0_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.0_filt_indels.vcf) &
proc4=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_9177.merged.dedup.realigned.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.0_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.0_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.0_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.0_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.0_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.0_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.0_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.0_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.0_filt_indels.vcf) &
proc5=$!
wait "$proc0" "$proc1" "$proc2" "$proc3" "$proc4" "$proc5"
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.0_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.0_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.0_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.0_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.0_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.0_filt_snps.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.vcf --excludeNonVariants --minimumN 1
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.0_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.0_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.0_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.0_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.0_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.0_filt_indels.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.vcf --excludeNonVariants --minimumN 1
vcftools --vcf /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.vcf --remove-filtered LowQual --remove-filtered default_snp_filter --recode --recode-INFO-all --out /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0
vcftools --vcf /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.vcf --remove-filtered LowQual --remove-filtered default_indel_filter --recode --recode-INFO-all --out /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_4278.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_0.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_4278.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_4278.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_0.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_0.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_0.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_0.plots) &
proc6=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12512.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_0.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12512.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12512.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_0.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_0.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_0.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_0.plots) &
proc7=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12513.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_0.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12513.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12513.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_0.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_0.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_0.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_0.plots) &
proc8=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_9721.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_0.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_9721.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_9721.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_0.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_0.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_0.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_0.plots) &
proc9=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_6993.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_0.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_6993.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_6993.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_0.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_0.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_0.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_0.plots) &
proc10=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_9177.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_0.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_9177.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_9177.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_0.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_0.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_0.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_0.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_0.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_0.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_0.plots) &
proc11=$!
wait "$proc6" "$proc7" "$proc8" "$proc9" "$proc10" "$proc11"
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.1_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.1_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.1_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.1_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.1_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.1_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.1_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.1_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.1_filt_indels.vcf) &
proc12=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.1_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.1_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.1_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.1_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.1_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.1_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.1_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.1_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.1_filt_indels.vcf) &
proc13=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.1_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.1_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.1_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.1_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.1_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.1_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.1_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.1_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.1_filt_indels.vcf) &
proc14=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.1_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.1_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.1_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.1_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.1_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.1_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.1_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.1_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.1_filt_indels.vcf) &
proc15=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.1_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.1_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.1_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.1_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.1_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.1_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.1_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.1_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.1_filt_indels.vcf) &
proc16=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.1_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.1_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.1_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.1_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.1_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.1_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.1_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.1_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.1_filt_indels.vcf) &
proc17=$!
wait "$proc12" "$proc13" "$proc14" "$proc15" "$proc16" "$proc17"
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.1_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.1_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.1_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.1_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.1_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.1_filt_snps.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.vcf --excludeNonVariants --minimumN 1
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.1_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.1_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.1_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.1_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.1_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.1_filt_indels.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.vcf --excludeNonVariants --minimumN 1
vcftools --vcf /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.vcf --remove-filtered LowQual --remove-filtered default_snp_filter --recode --recode-INFO-all --out /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1
vcftools --vcf /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.vcf --remove-filtered LowQual --remove-filtered default_indel_filter --recode --recode-INFO-all --out /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_1.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.recal_1.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_1.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_1.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_1.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_1.plots) &
proc18=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_1.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.recal_1.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_1.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_1.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_1.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_1.plots) &
proc19=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_1.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.recal_1.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_1.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_1.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_1.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_1.plots) &
proc20=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_1.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.recal_1.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_1.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_1.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_1.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_1.plots) &
proc21=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_1.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.recal_1.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_1.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_1.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_1.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_1.plots) &
proc22=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_1.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.recal_1.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_1.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_1.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_1.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_1.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_1.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_1.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_1.plots) &
proc23=$!
wait "$proc18" "$proc19" "$proc20" "$proc21" "$proc22" "$proc23"
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.recal_1.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.2_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.2_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.2_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.2_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.2_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.2_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.2_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.2_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.2_filt_indels.vcf) &
proc24=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.recal_1.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.2_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.2_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.2_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.2_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.2_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.2_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.2_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.2_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.2_filt_indels.vcf) &
proc25=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.recal_1.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.2_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.2_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.2_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.2_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.2_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.2_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.2_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.2_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.2_filt_indels.vcf) &
proc26=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.recal_1.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.2_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.2_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.2_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.2_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.2_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.2_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.2_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.2_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.2_filt_indels.vcf) &
proc27=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.recal_1.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.2_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.2_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.2_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.2_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.2_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.2_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.2_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.2_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.2_filt_indels.vcf) &
proc28=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.recal_1.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.2_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.2_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.2_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.2_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.2_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.2_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.2_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.2_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.2_filt_indels.vcf) &
proc29=$!
wait "$proc24" "$proc25" "$proc26" "$proc27" "$proc28" "$proc29"
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.2_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.2_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.2_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.2_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.2_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.2_filt_snps.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.vcf --excludeNonVariants --minimumN 1
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.2_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.2_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.2_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.2_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.2_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.2_filt_indels.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.vcf --excludeNonVariants --minimumN 1
vcftools --vcf /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.vcf --remove-filtered LowQual --remove-filtered default_snp_filter --recode --recode-INFO-all --out /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2
vcftools --vcf /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.vcf --remove-filtered LowQual --remove-filtered default_indel_filter --recode --recode-INFO-all --out /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_2.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.recal_1.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.recal_1.recal_2.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_2.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_2.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_2.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278_2.plots) &
proc30=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_2.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.recal_1.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.recal_1.recal_2.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_2.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_2.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_2.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512_2.plots) &
proc31=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_2.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.recal_1.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.recal_1.recal_2.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_2.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_2.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_2.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513_2.plots) &
proc32=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_2.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.recal_1.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.recal_1.recal_2.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_2.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_2.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_2.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721_2.plots) &
proc33=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_2.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.recal_1.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.recal_1.recal_2.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_2.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_2.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_2.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993_2.plots) &
proc34=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_2.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.recal_1.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.recal_1.recal_2.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.recal_1.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_2.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_2.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_2.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_2.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_2.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_2.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177_2.plots) &
proc35=$!
wait "$proc30" "$proc31" "$proc32" "$proc33" "$proc34" "$proc35"
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_4278.merged.dedup.realigned.recal_0.recal_1.recal_2.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.3_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.3_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.3_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.3_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.3_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.3_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.3_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.3_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.3_filt_indels.vcf) &
proc36=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12512.merged.dedup.realigned.recal_0.recal_1.recal_2.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.3_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.3_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.3_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.3_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.3_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.3_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.3_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.3_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.3_filt_indels.vcf) &
proc37=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_12513.merged.dedup.realigned.recal_0.recal_1.recal_2.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.3_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.3_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.3_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.3_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.3_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.3_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.3_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.3_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.3_filt_indels.vcf) &
proc38=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/A.tig_9721.merged.dedup.realigned.recal_0.recal_1.recal_2.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.3_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.3_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.3_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.3_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.3_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.3_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.3_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.3_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.3_filt_indels.vcf) &
proc39=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_6993.merged.dedup.realigned.recal_0.recal_1.recal_2.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.3_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.3_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.3_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.3_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.3_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.3_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.3_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.3_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.3_filt_indels.vcf) &
proc40=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/recal_bams/Atig_9177.merged.dedup.realigned.recal_0.recal_1.recal_2.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.3_raw_var.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.3_raw_var.vcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.3_raw_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.3_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.3_filt_snps.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.3_raw_var.vcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.3_raw_indels.vcf; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.3_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.3_filt_indels.vcf) &
proc41=$!
wait "$proc36" "$proc37" "$proc38" "$proc39" "$proc40" "$proc41"
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.3_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.3_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.3_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.3_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.3_filt_snps.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.3_filt_snps.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.vcf --excludeNonVariants --minimumN 1
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -T CombineVariants --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_4278.3_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12512.3_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_12513.3_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/A.tig_9721.3_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_6993.3_filt_indels.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/Atig_9177.3_filt_indels.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.vcf --excludeNonVariants --minimumN 1
vcftools --vcf /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.vcf --remove-filtered LowQual --remove-filtered default_snp_filter --recode --recode-INFO-all --out /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3
vcftools --vcf /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.vcf --remove-filtered LowQual --remove-filtered default_indel_filter --recode --recode-INFO-all --out /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/gatk_8_recalibrate_bams_2017-11-10.sh
::::::::::::::
#!/bin/bash
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_4278.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_4278_final.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_4278.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_4278_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_4278.merged.dedup.realigned.recal_final.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_4278.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_4278_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_4278_final.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_4278_final.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_4278_final.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_4278_final.plots) &
proc1=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12512.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12512_final.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12512.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12512_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12512.merged.dedup.realigned.recal_final.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12512.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12512_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12512_final.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12512_final.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12512_final.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12512_final.plots) &
proc2=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12513.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12513_final.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12513.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12513_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12513.merged.dedup.realigned.recal_final.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_12513.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12513_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12513_final.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12513_final.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12513_final.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12513_final.plots) &
proc3=$!
wait "$proc1" "$proc2" "$proc3"
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_9721.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_9721_final.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_9721.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_9721_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_9721.merged.dedup.realigned.recal_final.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/A.tig_9721.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_9721_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_9721_final.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_9721_final.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_9721_final.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_9721_final.plots) &
proc4=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_6993.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_6993_final.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_6993.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_6993_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_6993.merged.dedup.realigned.recal_final.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_6993.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_6993_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_6993_final.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_6993_final.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_6993_final.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_6993_final.plots) &
proc5=$!
(java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_9177.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_9177_final.before_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T PrintReads -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_9177.merged.dedup.realigned.bam -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_9177_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_9177.merged.dedup.realigned.recal_final.bam; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T BaseRecalibrator -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/Atig_9177.merged.dedup.realigned.bam -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_snps_3.recode.vcf -knownSites /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/training/variant_calling/all_indels_3.recode.vcf -BQSR /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_9177_final.before_table -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_9177_final.after_table; java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T AnalyzeCovariates -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -before /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_9177_final.before_table -after /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_9177_final.after_table -plots /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_9177_final.plots) &
proc6=$!
wait "$proc4" "$proc5" "$proc6"
wait "$proc1" "$proc2" "$proc3" "$proc4" "$proc5" "$proc6"
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/gatk_9_final_variant_calling_2017-11-10.sh
::::::::::::::
#!/bin/bash
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_4278.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/Atig_4278.final_raw_var.vcf &
proc0=$!
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12512.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/A.tig_12512.final_raw_var.vcf &
proc1=$!
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_12513.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/A.tig_12513.final_raw_var.vcf &
proc2=$!
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/A.tig_9721.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/A.tig_9721.final_raw_var.vcf &
proc3=$!
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_6993.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/Atig_6993.final_raw_var.vcf &
proc4=$!
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T HaplotypeCaller --variant_index_type LINEAR --variant_index_parameter 128000 -ERC GVCF -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -I /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/recal_bams/Atig_9177.merged.dedup.realigned.recal_final.bam  -stand_call_conf 30 -stand_emit_conf 30 -mbq 17  -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/Atig_9177.final_raw_var.vcf &
proc5=$!
wait "$proc0" "$proc1" "$proc2" "$proc3" "$proc4" "$proc5"
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T GenotypeGVCFs -nt 24 -ploidy 2 -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/Atig_4278.final_raw_var.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/A.tig_12512.final_raw_var.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/A.tig_12513.final_raw_var.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/A.tig_9721.final_raw_var.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/Atig_6993.final_raw_var.vcf --variant /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/hc_variant_calling/Atig_9177.final_raw_var.vcf -allSites -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/jg_Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.gvcf
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/jg_Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.gvcf -selectType SNP -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.final_raw_snps.vcf
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.final_raw_snps.vcf --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filterName "default_snp_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.final_filt_snps.vcf
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T SelectVariants -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/jg_Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.gvcf -selectType INDEL -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.final_raw_indels.vcf
java -Xmx3g -jar /home/dut/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T VariantFiltration -R /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa -V /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.final_raw_indels.vcf --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "default_indel_filter" -o /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.final_filt_indels.vcf
cat /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/jg_Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.gvcf | grep -v '\./\.'  > /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/jg_Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.no_call_removed.gvcf
cat /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.final_filt_snps.vcf | grep 'default_snp_filter' | cut -f 1,2 > /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/bad_snps.tsv
cat /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.final_filt_indels.vcf   | grep 'default_indel_filter' | cut -f 1,2 > /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/bad_indels.tsv
cat /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/bad_indels.tsv /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/bad_snps.tsv > /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/bad_positions.tsv
vcftools --vcf /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/jg_Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.no_call_removed.gvcf --recode --out /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/jg_Atig_4278_A.tig_12512_A.tig_12513_A.tig_9721_Atig_6993_Atig_9177.no_call_removed.filt --exclude-positions /n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/gatk_MOLNG-2139_MOLNG-2140/final_variant_calling/joint_genotypes/bad_positions.tsv
::::::::::::::
../data/gatk_MOLNG-2139_MOLNG-2140/master_2017-11-10.sh
::::::::::::::
#!/bin/bash
#echo "gatk_0_prep_ref_2017-11-10"
#time ./gatk_0_prep_ref_2017-11-10.sh &> gatk_0_prep_ref_2017-11-10.out
#echo "gatk_1_map_to_ref_2017-11-10"
#time ./gatk_1_map_to_ref_2017-11-10.sh &> gatk_1_map_to_ref_2017-11-10.out
#echo "gatk_2_dedup_individual_2017-11-10"
#time ./gatk_2_dedup_individual_2017-11-10.sh &> gatk_2_dedup_individual_2017-11-10.out
#echo "gatk_3_merge_bams_2017-11-10"
#time ./gatk_3_merge_bams_2017-11-10.sh &> gatk_3_merge_bams_2017-11-10.out
echo "gatk_4_dedup_merged_2017-11-10"
time ./gatk_4_dedup_merged_2017-11-10.sh &> gatk_4_dedup_merged_2017-11-10.out2
echo "gatk_5_index_2017-11-10"
time ./gatk_5_index_2017-11-10.sh &> gatk_5_index_2017-11-10.out2
echo "gatk_6_realign_bams_2017-11-10"
time ./gatk_6_realign_bams_2017-11-10.sh &> gatk_6_realign_bams_2017-11-10.out2
#echo "gatk_7_training_2017-11-10"
#time ./gatk_7_training_2017-11-10.sh &> gatk_7_training_2017-11-10.out
#echo "gatk_8_recalibrate_bams_2017-11-10"
#time ./gatk_8_recalibrate_bams_2017-11-10.sh &> gatk_8_recalibrate_bams_2017-11-10.out
#echo "gatk_9_final_variant_calling_2017-11-10"
#time ./gatk_9_final_variant_calling_2017-11-10.sh &> gatk_9_final_variant_calling_2017-11-10.out
In [117]:
%%bash
cd ../data/pysam
# for bam in ../gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/*.bam;
# do
#     nohup ./../../bin/pysam_profiler.py ../gatk_MOLNG-2139_MOLNG-2140/preprocessing/realigned_merged_bams/${bam} ../gatk_MOLNG-2139_MOLNG-2140/reference/tigris_scaffolds_filt_10000.fa &
# done
In [118]:
%%bash
cd ../data/pysam
#awk '$8!="Homozygous"' A.tig_12512.merged.dedup.realigned.prof > A.tig_12512.merged.dedup.realigned.prof.not_hom &
#awk '$8!="Homozygous"' A.tig_12513.merged.dedup.realigned.prof > A.tig_12513.merged.dedup.realigned.prof.not_hom &
#awk '$8!="Homozygous"' A.tig_9721.merged.dedup.realigned.prof > A.tig_9721.merged.dedup.realigned.prof.not_hom &
#awk '$8!="Homozygous"' Atig_4278.merged.dedup.realigned.prof > Atig_4278.merged.dedup.realigned.prof.not_hom &
#awk '$8!="Homozygous"' Atig_6993.merged.dedup.realigned.prof > Atig_6993.merged.dedup.realigned.prof.not_hom &
#awk '$8!="Homozygous"' Atig_9177.merged.dedup.realigned.prof > Atig_9177.merged.dedup.realigned.prof.not_hom &
In [119]:
%%bash
cd ../data/pysam
for prof in `ls -1 Atig_*.prof | grep -v 122; ls -1 A.tig*.prof` 
do 
echo $prof 
#./../../bin/count_profiles.py $prof &
done
Atig_4278.merged.dedup.realigned.prof
Atig_6993.merged.dedup.realigned.prof
Atig_9177.merged.dedup.realigned.prof
A.tig_12512.merged.dedup.realigned.prof
A.tig_12513.merged.dedup.realigned.prof
A.tig_9721.merged.dedup.realigned.prof
In [120]:
%%bash
cd ../data/pysam
for prof in `ls -1 Atig_*.prof.not_hom | grep -v 122; ls -1 A.tig*.prof.not_hom` 
do 
#echo $prof 
echo "./../../bin/scan_profile_no_lim.py $prof 20 &"
done
./../../bin/scan_profile_no_lim.py Atig_4278.merged.dedup.realigned.prof.not_hom 20 &
./../../bin/scan_profile_no_lim.py Atig_6993.merged.dedup.realigned.prof.not_hom 20 &
./../../bin/scan_profile_no_lim.py Atig_9177.merged.dedup.realigned.prof.not_hom 20 &
./../../bin/scan_profile_no_lim.py A.tig_12512.merged.dedup.realigned.prof.not_hom 20 &
./../../bin/scan_profile_no_lim.py A.tig_12513.merged.dedup.realigned.prof.not_hom 20 &
./../../bin/scan_profile_no_lim.py A.tig_9721.merged.dedup.realigned.prof.not_hom 20 &

Coverage Accross Profiles and Affect on Heterozygosity

The following python script will convert an alignment bam file (indexed) into a tsv file of profiles.

In [ ]:
# %load ../bin/pysam_profiler.py
#!/usr/bin/env python
#Author: Duncan Tormey
#Email: dut@stowers.org or duncantormey@gmail.com
#################################################
# This script takes a bam file and a reference  #
# genome file as input. It uses the pysam       #
# python package to pull out the per position   #
# nucleotide coverage, and determes the statust #
# of the position(Transition, Transversion,     #
# Homozygous or Unknown)                        #
#################################################

from __future__ import print_function
from __future__ import division
import pysam
import pysamstats
import sys
import os

def get_prof_status(prof, bases=['A', 'C', 'G', 'T']):
    bases = [bases[i] for i, n in enumerate(prof) if n > 0]
    if len(bases) == 2:
        if bases == ['A', 'G'] or bases == ['C', 'T']:
            status = 'Transition'
        else:
            status = 'Transversion'
    elif len(bases) == 1:
            status = 'Homozygous'
    else:
        status = 'Unknown'

    prof.extend([sum(prof), status])

    return  prof

def write_profiles(bam_path, ref_path):
    outfile = './' + os.path.basename(bam_path).replace('.bam', '.prof')
    bam = pysam.AlignmentFile(bam_path)
    with open(outfile, 'w') as fh:
        fh.write('chrom\tpos\tA\tC\tG\tT\tcov\tstatus\n')
        for rec in pysamstats.stat_variation(bam, ref_path, pad=True, max_depth=1000000):
            prof = [rec['A'], rec['C'], rec['G'], rec['T']]
            prof = get_prof_status(prof)
            prof[:0] = [rec['chrom'], rec['pos']]
            fh.write('%s\n' % '\t'.join(map(str,prof)))


if __name__ == "__main__":
    if len(sys.argv) == 3:
        print(sys.argv)
        write_profiles(sys.argv[1], sys.argv[2])
    else:
        print('usage: pysam_profiler.py /path/to/alignment.bam /path/to/reference.fa')

The following python script takes a tsv profiles file as input and counts the unique profiles across the genome.

In [ ]:
# %load ../bin/count_profiles.py
#!/usr/bin/env python
#Author: Duncan Tormey
#Email: dut@stowers.org or duncantormey@gmail.com
##################################################
# This script takes the tsv file output by      #
# pysam_profiler.py and counts the occurence of #
# each unique profile, ignoring scaffold and    #
# position.                                     #
#################################################


from __future__ import print_function
from __future__ import division
import sys
import os
from collections import Counter


def count_profiles(prof_path):
    outfile = './' + os.path.basename(prof_path).replace('.prof', '.prof_counts')
    counts = Counter()
    with open(prof_path, 'r') as fh:
        fh.next() # skip header
        for line in fh:
            line = line.strip().split()
            prof = tuple(line[2:])
            counts[prof] += 1

    return counts, outfile


def write_profiles(counts, outfile):
    with open(outfile, 'w') as fho:
        fho.write('occurence\tA\tC\tG\tT\tcov\tstatus\n')
        for prof, count in counts.items():
            fho.write('%s\t%s\n' % (str(count), '\t'.join(map(str, prof))))
    

if __name__ == "__main__":
    if len(sys.argv) == 2:
        print(sys.argv)
        counts, outfile = count_profiles(sys.argv[1])
        write_profiles(counts, outfile)
    else:
        print('usage: count_profiles /path/to/profile ')
In [121]:
pysamCountsPaths = {
    'Atig001': '../data/pysam/Atig001.merged.dedup.realigned.prof_counts',
    'A_tigris8450': '../data/pysam/A_tigris8450.merged.dedup.realigned.prof_counts',
    'Atig003': '../data/pysam/Atig003.merged.dedup.realigned.prof_counts',
    'Atig_122': '../data/pysam/Atig_122.merged.dedup.realigned.prof_counts',
    'A.tig_12512': '../data/pysam/A.tig_12512.merged.dedup.realigned.prof_counts',
    'A.tig_12513': '../data/pysam/A.tig_12513.merged.dedup.realigned.prof_counts',
    'A.tig_9721': '../data/pysam/A.tig_9721.merged.dedup.realigned.prof_counts',
    'Atig_4278': '../data/pysam/Atig_4278.merged.dedup.realigned.prof_counts',
    'Atig_6993': '../data/pysam/Atig_6993.merged.dedup.realigned.prof_counts',
    'Atig_9177': '../data/pysam/Atig_9177.merged.dedup.realigned.prof_counts',
}
In [122]:
profCountsDict = {a :pd.read_csv(p, sep = '\t', header = 0) for a,p in pysamCountsPaths.items()}
In [123]:
profCountsDict.keys()
Out[123]:
['A_tigris8450',
 'Atig_122',
 'Atig_4278',
 'Atig003',
 'Atig001',
 'A.tig_12512',
 'A.tig_12513',
 'A.tig_9721',
 'Atig_6993',
 'Atig_9177']
In [124]:
covValuesDict = {}
for animal in profCountsDict.keys():
    profCountsDict[animal]['animal'] = animal
    avg_cov = sum(profCountsDict[animal]['cov']*profCountsDict[animal]['occurence'])/profCountsDict[animal]['occurence'].sum()
    profCountsDict[animal]['avg_cov'] = avg_cov
    covValuesDict[animal] = int(round_up_to_even(avg_cov))
    print(animal, avg_cov)
    

profCountDF= pd.concat(profCountsDict.values())
profCountDF.head()
A_tigris8450 17.5365944457
Atig_122 18.8524151373
Atig_4278 18.9034922139
Atig003 18.3134659875
Atig001 15.9133728851
A.tig_12512 18.2086642398
A.tig_12513 18.4186990258
A.tig_9721 17.907845091
Atig_6993 17.6018239458
Atig_9177 20.0751320379
Out[124]:
occurence A C G T cov status animal avg_cov
0 38 306 0 0 1 307 Transversion A_tigris8450 17.536594
1 3 13 0 199 0 212 Transition A_tigris8450 17.536594
2 1 0 58 6 4 68 Unknown A_tigris8450 17.536594
3 1 1 2918 2 1 2922 Unknown A_tigris8450 17.536594
4 1 4 3 770 2 779 Unknown A_tigris8450 17.536594
In [125]:
sns.set_style("whitegrid", {'axes.grid' : False})
animal="A_tigris8450"
outer_lim = 50
cov_dist = profCountDF[profCountDF.animal == animal][['cov','occurence']].groupby('cov')['occurence'].sum().reset_index()
cov_mean = sum(cov_dist['cov'] * cov_dist.occurence)/cov_dist.occurence.sum()
cov_dist['occurence'][cov_dist['cov'] >= outer_lim] = cov_dist['occurence'][cov_dist['cov'] >= outer_lim].sum()
cov_dist = cov_dist[cov_dist['cov'] <= outer_lim] 

plt.rc('font', size=minorFontSize)
ax1 = cov_dist.plot(x='cov',
                    y='occurence',
                    kind='bar',
                    legend=True,
                    color=color_ids[animal], 
                    label=change_name(animal), 
                    fontsize=minorFontSize, 
                    edgecolor=color_ids[animal],
                    figsize=(1.6,1.88)
                   )

ticks = ax1.xaxis.get_ticklocs()
ticklabels = [l.get_text() for l in ax1.xaxis.get_ticklabels()]
ticklabels[-1] = r'$\geq%s$'%str(outer_lim)
ax1.xaxis.set_ticks(ticks[::5])
ax1.xaxis.set_ticklabels(ticklabels[::5],rotation=90, fontsize=minorFontSize-2)


ax1.set_ylabel('Number of Sites', fontsize=minorFontSize)
ax1.set_xlabel('Coverage', fontsize=minorFontSize)
#ax1.set_title("Distribution of Coverage: %s" % change_name(animal))
ax1.set_ylim(0, 1.4e8)
ax1.set_yticks(np.arange(0,1.5e8,2e7))
#ax1.get_yaxis().get_major_formatter().set_useOffset(False)
ax1.yaxis.get_offset_text().set_fontsize(majorFontSize)
#add fill
ax1.fill_between((cov_mean, ax1.get_xlim()[1]),ax1.get_ylim()[0],ax1.get_ylim()[1],color='grey',alpha=0.2) 

#add figure label
ax1.legend(prop={'size':6})
fig = ax1.get_figure()
fig.savefig('../fig2/Figure3A1.pdf', bbox_inches='tight',)
In [126]:
sns.set_style("whitegrid", {'axes.grid' : False})
animal="Atig_122"
outer_lim = 50
cov_dist = profCountDF[profCountDF.animal == animal][['cov','occurence']].groupby('cov')['occurence'].sum().reset_index()
cov_mean = sum(cov_dist['cov'] * cov_dist.occurence)/cov_dist.occurence.sum()
cov_dist['occurence'][cov_dist['cov'] >= outer_lim] = cov_dist['occurence'][cov_dist['cov'] >= outer_lim].sum()
cov_dist = cov_dist[cov_dist['cov'] <= outer_lim] 

plt.rc('font', size=minorFontSize)
ax1 = cov_dist.plot(x='cov',
                    y='occurence',
                    kind='bar',
                    legend=True,
                    color=color_ids[animal], 
                    label=change_name(animal), 
                    fontsize=minorFontSize, 
                    edgecolor=color_ids[animal],
                    figsize=(1.6,1.88)
                   )

ticks = ax1.xaxis.get_ticklocs()
ticklabels = [l.get_text() for l in ax1.xaxis.get_ticklabels()]
ticklabels[-1] = r'$\geq%s$'%str(outer_lim)
ax1.xaxis.set_ticks(ticks[::5])
ax1.xaxis.set_ticklabels(ticklabels[::5],rotation=90, fontsize=minorFontSize-2)


ax1.set_ylabel('Number of Sites', fontsize=minorFontSize)
ax1.set_xlabel('Coverage', fontsize=minorFontSize)
#ax1.set_title("Distribution of Coverage: %s" % change_name(animal))
ax1.set_ylim(0, 1.4e8)
ax1.set_yticks(np.arange(0,1.5e8,2e7))
#ax1.get_yaxis().get_major_formatter().set_useOffset(False)
ax1.yaxis.get_offset_text().set_fontsize(majorFontSize)
#add fill
ax1.fill_between((cov_mean, ax1.get_xlim()[1]),ax1.get_ylim()[0],ax1.get_ylim()[1],color='grey',alpha=0.2) 

#add figure label
ax1.legend(prop={'size':6})
fig = ax1.get_figure()
fig.savefig('../fig2/Figure3A2.pdf', bbox_inches='tight')
In [127]:
sns.set(font_scale=1.0)
sns.set_style("whitegrid")
fig=plt.figure(figsize=(6.5,9.0), dpi=100)
fig.subplots_adjust(hspace=0.45,wspace=0.45)
gs = gridspec.GridSpec(4, 3)
row = [0,0,0,1,1,1,2,2,2,3,3,3,]
column = [0,1,2, 0,1,2, 0,1,2, 0,1,2]
letters = ['A', 'B', 'C', 'D', 'E', 'F','G', 'H', 'I', 'J']

for i, animal in enumerate(animal_ids): 
    ax = plt.subplot(gs[row[i], column[i]])
    outer_lim = 50
    cov_dist = profCountDF[profCountDF.animal == animal][['cov','occurence']].groupby('cov')['occurence'].sum().reset_index()
    cov_mean = sum(cov_dist['cov'] * cov_dist.occurence)/cov_dist.occurence.sum()
    cov_dist['occurence'][cov_dist['cov'] >= outer_lim] = cov_dist['occurence'][cov_dist['cov'] >= outer_lim].sum()
    cov_dist = cov_dist[cov_dist['cov'] <= outer_lim] 

    ax = cov_dist.plot(x='cov',
                       y='occurence',
                       kind='bar',
                       legend=False,
                       ax=ax, 
                       color=color_ids[animal], 
                       label=change_name(animal), 
                       fontsize=minorFontSize,
                      )
    ticks = ax.xaxis.get_ticklocs()
    ticklabels = [l.get_text() for l in ax.xaxis.get_ticklabels()]
    ticklabels[-1] = r'$\geq%s$'%str(outer_lim)
    ax.xaxis.set_ticks(ticks[::5])
    ax.xaxis.set_ticklabels(ticklabels[::5],rotation=90, fontsize=minorFontSize)
    ax.set_ylabel('Number of Sites', fontsize=minorFontSize)
    ax.set_xlabel('Coverage', fontsize=minorFontSize)
    #ax.set_title( 'Animal:'%change_name(animal), fontsize=minorFontSize)
    ax.set_ylim(0,140000000)
    ax.fill_between((cov_mean, ax.get_xlim()[1]),ax.get_ylim()[0],ax.get_ylim()[1],color='grey',alpha=0.2) 
    ax.text(23,102000000,'avg. cov = %s' % str(round(cov_mean,2)), fontsize=minorFontSize-3)
    
    ax.legend(loc='best',
              ncol=1,
              markerscale=0.2,
              fontsize=minorFontSize-2)
    ax.text(-0.23, 1.12, letters[i], transform=ax.transAxes,
      fontsize=minorFontSize, fontweight='bold', va='top', ha='right')
    
fig.savefig('../fig2/SupplementalFigure5.pdf',bbox_inches='tight',pad_inches=0.1)
In [128]:
def check_equal_het(row):
    counts = [row['A'],row['T'],row['C'],row['G']]
    non_zero_counts = [n for n in counts if n > 0]
    if len(non_zero_counts)==2 and len(set(non_zero_counts))==1:
        return True
    else:
        return False
In [129]:
def allele_count(row):
    counts = [row['A'],row['T'],row['C'],row['G']]
    non_zero_counts = [n for n in counts if n > 0]
    non_zero_counts = sorted(non_zero_counts)
    return tuple(non_zero_counts)
In [130]:
profCountDF['allele_count'] = apply_by_multiprocessing(profCountDF, allele_count,8)
In [131]:
profCountDF['equal_het'] = apply_by_multiprocessing(profCountDF,check_equal_het,8)
In [132]:
def ret_per_cov_equal_rates(prof_count_df):
    tuple_list = []
    for animal in prof_count_df.animal.unique():
        for coverage in range(2, 100, 2):
            grouped = prof_count_df[(prof_count_df.animal == animal) & (prof_count_df['cov'] == coverage)].groupby('equal_het').sum()
            try:
                het = grouped['occurence'][True]
            except IndexError:
                het=0
            try:
                hom = grouped['occurence'][False]
            except IndexError:
                hom = 0
            try:
                rate = float(het)/float(het + hom) * 10000
            except ZeroDivisionError:
                rate = 0
            tuple_list.append((animal, coverage, het, hom, het+hom, rate))

    columns = ['animal','cov','num_equal_het', 'homozygous','total_sites','het_per_10kb']
    per_cov_equal_rates_df = pd.DataFrame(tuple_list)
    per_cov_equal_rates_df.columns = columns
    
    return per_cov_equal_rates_df
In [133]:
perCovEqualRatesDF = ret_per_cov_equal_rates(profCountDF)
perCovEqualRatesDF.head()
Out[133]:
animal cov num_equal_het homozygous total_sites het_per_10kb
0 A_tigris8450 2 5583 1012258 1017841 54.851396
1 A_tigris8450 4 1263 3500489 3501752 3.606766
2 A_tigris8450 6 654 10612477 10613131 0.616218
3 A_tigris8450 8 670 25872112 25872782 0.258959
4 A_tigris8450 10 726 49982983 49983709 0.145247
In [134]:
##### sns.set_style("whitegrid", {'axes.grid' : True})
fig = plt.figure(1, figsize=(3.2, 1.93), dpi=100)
#fig = plt.figure(1, figsize=(32, 19.3))

ax = fig.add_subplot(111)

for animal in og_animal_ids:
    data = perCovEqualRatesDF[(perCovEqualRatesDF.animal == animal) & (perCovEqualRatesDF['cov'] > 2)]
    ax = data.plot(x='cov', 
                    y='het_per_10kb',
                    ylim=(-0.5,18), 
                    xlim = (3.5,100.5), 
                    style='--o', 
                    yticks = np.arange(0,18,1),
                    logy=True,
                    xticks=np.arange(0,102,4),
                    label=change_name(animal), 
                    ax=ax,
                    color=color_ids[animal], 
                    fontsize=minorFontSize, 
                    markersize=4, 
                    linewidth=0.8
)
    
    ax.set_ylabel('Het. Sites / 10kb (Log10 scale)', fontsize=minorFontSize)
    ax.set_xlabel('Coverage', fontsize=minorFontSize)
    #ax2.set_title('Rate of Even Split Heterozygous Sites vs Coverage', fontsize=minorFontSize)
    ax.legend(loc=5, bbox_to_anchor=(1.32, 0.5), prop={'size':8})


ax.set_xticklabels(ax.get_xticks(), rotation=90, fontsize=minorFontSize-2);
ax.set_yticks( [0.1,1,10,50])
ax.yaxis.set_major_formatter(ScalarFormatter())
#ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda y,pos: ('{{:.{:1d}f}}'.format(int(np.maximum(-np.log10(y),0)))).format(y)))

#fig.savefig('../fig2/Figure6B.pdf', format="pdf",bbox_inches='tight')
In [135]:
##### sns.set_style("whitegrid", {'axes.grid' : True})
fig = plt.figure(1, figsize=(3.2, 1.93),dpi=200)
#fig = plt.figure(1, figsize=(32, 19.3))

ax = fig.add_subplot(111)

for animal in animal_ids:
    data = perCovEqualRatesDF[(perCovEqualRatesDF.animal == animal) & (perCovEqualRatesDF['cov'] > 2)]
    ax = data.plot(x='cov', 
                    y='het_per_10kb',
                    ylim=(-0.5,18), 
                    xlim = (3.5,100.5), 
                    style='--o', 
                    yticks = np.arange(0,18,1),
                    logy=True,
                    xticks=np.arange(0,102,4),
                    label=change_name(animal), 
                    ax=ax,
                    color=color_ids[animal], 
                    fontsize=minorFontSize, 
                    markersize=3, 
                    linewidth=0.8,
                    alpha=0.8
)
    
    ax.set_ylabel('Het. Sites / 10kb (Log10 scale)', fontsize=minorFontSize)
    ax.set_xlabel('Coverage', fontsize=minorFontSize)
    #ax2.set_title('Rate of Even Split Heterozygous Sites vs Coverage', fontsize=minorFontSize)
    ax.legend(loc=5, bbox_to_anchor=(1.32, 0.5), prop={'size':8})


ax.set_xticklabels(ax.get_xticks(), rotation=90, fontsize=minorFontSize-2);
ax.set_yticks( [0.1,1,10,50])
ax.yaxis.set_major_formatter(ScalarFormatter())
#ax.yaxis.set_major_formatter(ticker.FuncFormatter(lambda y,pos: ('{{:.{:1d}f}}'.format(int(np.maximum(-np.log10(y),0)))).format(y)))

fig.savefig('../fig2/Figure3B.pdf', format="pdf",bbox_inches='tight')

Confidence in Allele Support Inferred from Transition Transversion Ratio

In [136]:
def ret_tntv_df(prof_count_df):
    tuple_list = []
    for animal in prof_count_df.animal.unique():
        for cov in xrange(10,26,2):
            temp = prof_count_df[(prof_count_df.status != 'Homozygous') & (prof_count_df.animal == animal) & (prof_count_df['cov'] == cov)].sort_values('allele_count')
            genos = temp['allele_count'].unique().tolist()
            for geno in genos:
                transitions = temp[(temp.status=='Transition')&(temp.allele_count == geno)]['occurence'].sum()
                transversions = temp[(temp.status=='Transversion')&(temp.allele_count == geno)]['occurence'].sum()
                total = temp[temp.allele_count == geno]['occurence'].sum()
                try:
                    ratio = float(transitions)/float(transversions)
                    definable = True
                except ZeroDivisionError:
                    ratio = 0
                    definable = False   
                t = (animal, cov, geno, transitions, transversions, ratio, definable, total)

                tuple_list.append(t)



    tntv_df = pd.DataFrame(tuple_list)
    tntv_df.columns = ['animal','cov','allele_count','transitions','transversions', 'tn_tv_ratio','definable','total']

    return tntv_df
In [137]:
tntvDF = ret_tntv_df(profCountDF)
tntvDF.head()
Out[137]:
animal cov allele_count transitions transversions tn_tv_ratio definable total
0 A_tigris8450 10 (1, 1, 1, 7) 0 0 0.0 False 42
1 A_tigris8450 10 (1, 1, 2, 6) 0 0 0.0 False 17
2 A_tigris8450 10 (1, 1, 3, 5) 0 0 0.0 False 2
3 A_tigris8450 10 (1, 1, 8) 0 0 0.0 False 6942
4 A_tigris8450 10 (1, 2, 2, 5) 0 0 0.0 False 1
In [138]:
fig = plt.figure(1, figsize=(3.2, 1.93), dpi=100)

ax = fig.add_subplot(111)
animal = 'Atig_122'
cov = 18

subed = tntvDF[(tntvDF.animal == animal) & (tntvDF['cov']==cov) & (tntvDF.definable==True)]

ax = sns.barplot(x = subed.allele_count, y = subed.transitions + subed.transversions, color = "lightblue",ax=ax)

#Plot 2 - overlay - "bottom" series
bottom_plot = sns.barplot(x = subed.allele_count, y = subed.transversions, color = "purple",ax=ax)

#legend
topbar = plt.Rectangle((0,0),1,1,fc="lightblue", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='purple',  edgecolor = 'none')

l = ax.legend([bottombar, topbar], 
              ['Transversions', 'Transitions'], 
              loc=1, 
              ncol = 1, 
              fontsize=minorFontSize
              )

l.draw_frame(False)

#label bars
rects = ax.patches

labels = ['R: {}\n N: {:,}'.format(str(i)[:4], t) for i,t in zip(subed.tn_tv_ratio, subed.total)]

for rect, label in zip(rects, labels):
    height = rect.get_height()
    ax.text(rect.get_x() + rect.get_width()/2, height + 27, label, ha='center',rotation=40, va='bottom', fontsize=minorFontSize-4,)
    
bottom_plot.set_ylabel("Number of Sites", fontsize=minorFontSize)
bottom_plot.set_xlabel("Genotype", fontsize=minorFontSize)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45, fontsize=minorFontSize)
ax.set_xlim(ax.get_xlim()[0]-0.3,
            ax.get_xlim()[1]+0.3)

ax.set_title('%s Genotypes at %sX'%(change_name(animal), str(cov)), fontsize=minorFontSize)
ax.set_yticks(xrange(0,4500000,500000));
ax.ticklabel_format(style='sci',scilimits=(-3,4),axis='y',useOffset=True, fontsize=minorFontSize)


#fig.savefig('../fig/Figure6D_v1.png',bbox_inches='tight', dpi=300)
#fig.savefig('../fig2/Figure6D.pdf',bbox_inches='tight')
In [139]:
fig = plt.figure(1, figsize=(3.2, 1.93), dpi=200)

ax = fig.add_subplot(111)
animal = 'Atig_122'
cov = 18

subed = tntvDF[(tntvDF.animal == animal) & (tntvDF['cov']==cov) & (tntvDF.definable==True)]
subed = subed[subed.allele_count != (1,cov-1)]

ax = sns.barplot(x = subed.allele_count, y = subed.transitions + subed.transversions, color = "lightblue",ax=ax)

#Plot 2 - overlay - "bottom" series
bottom_plot = sns.barplot(x = subed.allele_count, y = subed.transversions, color = "purple",ax=ax)

#legend
topbar = plt.Rectangle((0,0),1,1,fc="lightblue", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='purple',  edgecolor = 'none')

l = ax.legend([bottombar, topbar], 
              ['Transversions', 'Transitions'], 
              loc='upper center', 
              ncol = 1, 
              fontsize=minorFontSize-2
              )

l.draw_frame(False)

#label bars
rects = ax.patches

labels = ['R: {}\n N: {:,}'.format(str(i)[:4], t) for i,t in zip(subed.tn_tv_ratio, subed.total)]

for rect, label in zip(rects, labels):
    height = rect.get_height()
    ax.text(rect.get_x() + rect.get_width()/2, height + 27, label, ha='center',rotation=40, va='bottom', fontsize=minorFontSize-4,)
    
bottom_plot.set_ylabel("Number of Sites", fontsize=minorFontSize)
bottom_plot.set_xlabel("Genotype", fontsize=minorFontSize)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45, fontsize=minorFontSize)
ax.set_xlim(ax.get_xlim()[0]-0.3,
            ax.get_xlim()[1]+0.3)


ax.set_title('%s Genotypes at %sX'%(change_name(animal), str(cov)), fontsize=minorFontSize)
ax.ticklabel_format(style='sci',scilimits=(-3,4),axis='y',useOffset=True, fontsize=minorFontSize)
ax.set_yticks(xrange(0,320000,50000));

#fig.savefig('../fig/Figure6D_v2.png',bbox_inches='tight', dpi=300)
#fig.savefig('../fig/Figure6D_v2.pdf',bbox_inches='tight')
In [140]:
fig = plt.figure(1, figsize=(3.2, 1.93))

ax = fig.add_subplot(111)
animal = 'A_tigris8450'
cov = 18

subed = tntvDF[(tntvDF.animal == animal) & (tntvDF['cov']==cov)&(tntvDF.definable==True)]

ax = sns.barplot(x = subed.allele_count, y = subed.transitions + subed.transversions, color = "lightblue",ax=ax)

#Plot 2 - overlay - "bottom" series
bottom_plot = sns.barplot(x = subed.allele_count, y = subed.transversions, color = "purple",ax=ax)

#legend
topbar = plt.Rectangle((0,0),1,1,fc="lightblue", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='purple',  edgecolor = 'none')
l = ax.legend([bottombar, topbar], ['Transversions', 'Transitions'], loc=1, ncol = 1, fontsize=minorFontSize - 2
              )
l.draw_frame(False)

#label bars
rects = ax.patches

# Now make some labels
labels = ['R: {}\n N: {:,}'.format(str(i)[:4], t) for i,t in zip(subed.tn_tv_ratio, subed.total)]

for rect, label in zip(rects, labels):
    height = rect.get_height()
    ax.text(rect.get_x() + rect.get_width()/2, height + 27, label, ha='center',rotation=40, va='bottom', fontsize=minorFontSize-4,)

bottom_plot.set_ylabel("Number of Sites", fontsize=minorFontSize)
bottom_plot.set_xlabel("Genotype", fontsize=minorFontSize)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45, fontsize=minorFontSize)
ax.set_xlim(ax.get_xlim()[0]-0.3,
            ax.get_xlim()[1]+0.3)


ax.set_title('%s Genotypes at %sX'%(change_name(animal), str(cov)), fontsize=minorFontSize)
ax.ticklabel_format(style='sci',scilimits=(-3,4),axis='y',useOffset=True, fontsize=minorFontSize)
ax.set_yticks(xrange(0,4500000,500000));

#fig.savefig('../fig/Figure6E_v1.png',bbox_inches='tight', dpi=300)
#fig.savefig('../fig2/Figure6E.pdf',bbox_inches='tight')
In [141]:
fig = plt.figure(1, figsize=(3.2, 1.93), dpi=200)

ax = fig.add_subplot(111)
animal = 'A_tigris8450'
cov = 18

subed = tntvDF[(tntvDF.animal == animal) & (tntvDF['cov']==cov)&(tntvDF.definable==True)]
subed = subed[subed.allele_count != (1,cov-1)]

ax = sns.barplot(x = subed.allele_count, y = subed.transitions + subed.transversions, color = "lightblue",ax=ax)

#Plot 2 - overlay - "bottom" series
bottom_plot = sns.barplot(x = subed.allele_count, y = subed.transversions, color = "purple",ax=ax)

#legend
topbar = plt.Rectangle((0,0),1,1,fc="lightblue", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='purple',  edgecolor = 'none')
l = ax.legend([bottombar, topbar], ['Transversions', 'Transitions'], loc=1, ncol = 1, fontsize=minorFontSize
              )
l.draw_frame(False)

#label bars
rects = ax.patches

# Now make some labels
labels = ['R: {}\n N: {:,}'.format(str(i)[:4], t) for i,t in zip(subed.tn_tv_ratio, subed.total)]

for rect, label in zip(rects, labels):
    height = rect.get_height()
    ax.text(rect.get_x() + rect.get_width()/2, height + 27, label, ha='center',rotation=40, va='bottom', fontsize=minorFontSize-4,)

bottom_plot.set_ylabel("Number of Sites", fontsize=minorFontSize)
bottom_plot.set_xlabel("Genotype", fontsize=minorFontSize)
ax.set_xticklabels(ax.get_xticklabels(),rotation=45, fontsize=minorFontSize)
ax.set_xlim(ax.get_xlim()[0]-0.3,
            ax.get_xlim()[1]+0.3)


ax.set_title('%s Genotypes at %sX'%(change_name(animal), str(cov)), fontsize=minorFontSize)
ax.ticklabel_format(style='sci',scilimits=(-3,4),axis='y',useOffset=True, fontsize=minorFontSize)
ax.set_yticks(xrange(0,320000,50000));


#fig.savefig('../fig/Figure6E_v2.png',bbox_inches='tight', dpi=300)
#fig.savefig('../fig/Figur6E_v2.pdf',bbox_inches='tight')
In [142]:
sns.set(font_scale=1.0)
sns.set_style("whitegrid")
fig=plt.figure(figsize=(6.5,9.0), dpi=200)
fig.subplots_adjust(hspace=0.8,wspace=0.6)
gs = gridspec.GridSpec(4, 3)
row = [0,0,0,1,1,1,2,2,2,3,3,3,]
column = [0,1,2, 0,1,2, 0,1,2, 0,1,2]
letters = ['A', 'B', 'C', 'D', 'E', 'F','G', 'H', 'I', 'J']



for i, animal in enumerate(animal_ids):
    ax = plt.subplot(gs[row[i], column[i]])
    cov = covValuesDict[animal]
    subed = tntvDF[(tntvDF.animal == animal) & (tntvDF['cov']==cov)&(tntvDF.definable==True)]
    subed = subed[subed.allele_count != (1,cov-1)]

    ax = sns.barplot(x = subed.allele_count, y = subed.transitions + subed.transversions, color = "lightblue",ax=ax)

    #Plot 2 - overlay - "bottom" series
    bottom_plot = sns.barplot(x = subed.allele_count, y = subed.transversions, color = "purple",ax=ax)

    #legend
    topbar = plt.Rectangle((0,0),1,1,fc="lightblue", edgecolor = 'none')
    bottombar = plt.Rectangle((0,0),1,1,fc='purple',  edgecolor = 'none')
    l = ax.legend([bottombar, topbar], 
                  ['Transversions', 'Transitions'], 
                  markerscale=0.4,
                  loc=1, 
                  ncol=1, 
                  fontsize=minorFontSize -3
                  )
    l.draw_frame(False)

    #label bars
    rects = ax.patches

    # Now make some labels
    labels = ['R: {}\n N: {:,}'.format(str(k)[:4], t) for k,t in zip(subed.tn_tv_ratio, subed.total)]

    for rect, label in zip(rects, labels):
        height = rect.get_height()
        ax.text(rect.get_x() + rect.get_width()/2, height + 27, label, ha='center',rotation=40, va='bottom', fontsize=minorFontSize-5,)

    bottom_plot.set_ylabel("Number of Sites In Genome", fontsize=minorFontSize)
    bottom_plot.set_xlabel("Genotype", fontsize=minorFontSize)
    ax.set_xticklabels(ax.get_xticklabels(),rotation=45, fontsize=minorFontSize-2)


    ax.set_title(change_name(animal), fontsize=minorFontSize)
    ax.ticklabel_format(style='sci',scilimits=(-3,4),axis='y',useOffset=True)
    ax.set_yticks(xrange(0,400000,50000))
    ax.text(-0.13, 1.22, 
            letters[i], 
            transform=ax.transAxes,
            fontsize=majorFontSize, 
            fontweight='bold', 
            va='top', 
            ha='right')


#fig.savefig('../fig2/supplemental_figure_4.pdf',bbox_inches='tight',pad_inches=0.1)

General Genotype Support

In [143]:
tntvDF.animal.unique()
Out[143]:
array(['A_tigris8450', 'Atig_122', 'Atig_4278', 'Atig003', 'Atig001',
       'A.tig_12512', 'A.tig_12513', 'A.tig_9721', 'Atig_6993', 'Atig_9177'], dtype=object)
In [144]:
tntvDF['num_alleles'] = tntvDF['allele_count'].apply(lambda x: len(x))
tntvDF.head()
Out[144]:
animal cov allele_count transitions transversions tn_tv_ratio definable total num_alleles
0 A_tigris8450 10 (1, 1, 1, 7) 0 0 0.0 False 42 4
1 A_tigris8450 10 (1, 1, 2, 6) 0 0 0.0 False 17 4
2 A_tigris8450 10 (1, 1, 3, 5) 0 0 0.0 False 2 4
3 A_tigris8450 10 (1, 1, 8) 0 0 0.0 False 6942 3
4 A_tigris8450 10 (1, 2, 2, 5) 0 0 0.0 False 1 4
In [145]:
sns.set(font_scale=1.0)
sns.set_style("whitegrid")
letters = ['A', 'B', 'C', 'D', 'E', 'F','G', 'H', 'I', 'J']
for e, animal in enumerate(animal_ids):
    fig,(ax,ax2) = plt.subplots(2, 1, sharex=True, figsize=(3.2, 1.93), dpi=100)

    data = tntvDF[(tntvDF['cov']==covValuesDict[animal]) & \
                  (tntvDF['animal']==animal)]
        
    imp_total = data.loc[data['num_alleles'] > 2,'total'].sum()
    
    temp = data[data.num_alleles < 3][['animal', 'total', 'allele_count']]
    
    new_row = pd.DataFrame([{'animal': animal ,'allele_count':'Alleles > 2', 'total': imp_total}])
    data = pd.concat([temp,new_row])

    

    data.plot(x='allele_count', y='total', kind='bar', width=1, color=color_ids[animal], ax=ax, legend=False, title=id_to_name[animal])
    data.plot(x='allele_count', y='total', kind='bar', width=1, color=color_ids[animal], ax=ax2, legend=False)

    ax.set_ylim(1000000,7000000)
    ax2.set_ylim(0,240000)
    
    ax.xaxis.grid(False)
    ax2.xaxis.grid(False)
    

    ax.ticklabel_format(style='sci',scilimits=(-3,4),axis='y',useOffset=False, fontsize=minorFontSize)
    ax2.ticklabel_format(style='sci',scilimits=(-3,4),axis='y',useOffset=False, fontsize=minorFontSize)

    ax2.set_xlabel('Genotype')

    ax.spines['bottom'].set_visible(False)
    ax2.spines['top'].set_visible(False)

    for i, v in enumerate(data.total):
        if v < 200000:
            ax2.text(i , v+8000,
                     "{:,}".format(v), 
                     color='black',
                     #fontweight='bold',
                     fontsize=minorFontSize - 2,
                     horizontalalignment='left', 
                     verticalalignment='bottom', 
                     rotation=45)
        elif v > 1000000:
            ax.text(i , v+80000,
                    "{:,}".format(v), 
                    color='black',
                    fontsize=minorFontSize - 2,
                    #fontweight='bold', 
                    horizontalalignment='left', 
                    verticalalignment='bottom',
                    rotation=45)

    d = .011  # how big to make the diagonal lines in axes coordinates
    # arguments to pass to plot, just so we don't keep repeating them
    kwargs = dict(transform=ax.transAxes, color='k', clip_on=False)
    ax.plot((-d, +d), (-d, +d), **kwargs)        # top-left diagonal
    ax.plot((1 - d, 1 + d), (-d, +d), **kwargs)  # top-right dibagonal

    kwargs.update(transform=ax2.transAxes)  # switch to the bottom axes
    ax2.plot((-d, +d), (1 - d, 1 + d), **kwargs)  # bottom-left diagonal
    ax2.plot((1 - d, 1 + d), (1 - d, 1 + d), **kwargs)  # bottom-right diagonal
    


    
    for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] + ax.get_xticklabels() + ax.get_yticklabels()):
        item.set_fontsize(minorFontSize)
    
    for item in ([ax2.title, ax2.xaxis.label, ax2.yaxis.label] + ax2.get_xticklabels() + ax2.get_yticklabels()):
        item.set_fontsize(minorFontSize)
    
    ax.yaxis.get_offset_text().set_size(minorFontSize-2)
    ax2.yaxis.get_offset_text().set_size(minorFontSize-2)
    
#     fig.savefig('../fig2/SupplementalFigure5%s.pdf' % letters[e], format="pdf",bbox_inches='tight')

    
#     if animal == 'Atig_122':
#         fig.savefig('../fig2/Figure3C.pdf', format="pdf",bbox_inches='tight')
#     elif animal == 'A_tigris8450':
#         fig.savefig('../fig2/Figure3D.pdf', format="pdf",bbox_inches='tight')
In [146]:
sns.set(font_scale=1.0)
sns.set_style("whitegrid")
letters = ['A', 'B', 'C', 'D', 'E', 'F','G', 'H', 'I', 'J']
for e, animal in enumerate(animal_ids):

    fig,(ax, ax2) = plt.subplots(1, 2, 
                                 sharey=True, 
                                 figsize=(3.2, 1.93), 
                                 dpi=200, 
                                 gridspec_kw = {'wspace':0.1, 'hspace':0}
                                )


    fig.suptitle(id_to_name[animal], fontsize=majorFontSize)
    
    data = tntvDF[(tntvDF['cov']==covValuesDict[animal]) & \
                  (tntvDF['animal']==animal) & \
                  (tntvDF.num_alleles < 3)]
#     imp_total = data.loc[data['num_alleles'] > 2,'total'].sum()
    
#     temp = data[data.num_alleles < 3][['animal', 'total', 'allele_count']]
 
#     new_row = pd.DataFrame([{'animal': animal ,'allele_count':'Alleles > 2', 'total': imp_total}])
#     data = pd.concat([temp,new_row])
 

    

    data.plot(x='allele_count', y='total', kind='barh', width=1, color=color_ids[animal], ax=ax, legend=False)
    data.plot(x='allele_count', y='total', kind='barh', width=1, color=color_ids[animal], ax=ax2, legend=False)

    ax2.set_xlim(1000000,7000000)
    ax.set_xlim(0,240000)
    
    ax.yaxis.grid(False)
    ax2.yaxis.grid(False)
    

    ax.ticklabel_format(style='sci',scilimits=(-3,4),axis='x',useOffset=False, fontsize=minorFontSize)
    ax2.ticklabel_format(style='sci',scilimits=(-3,4),axis='x',useOffset=False, fontsize=minorFontSize)

    ax.set_ylabel('Genotype')

    ax.spines['right'].set_visible(False)
    ax2.spines['left'].set_visible(False)

    for i, v in enumerate(data.total):
        if v < 200000:
            ax.text(v+9000, i,
                     "{:,}".format(v), 
                     color='black',
                     #fontweight='bold',
                     fontsize=minorFontSize - 2,
                     horizontalalignment='left', 
                     verticalalignment='center', 
                     rotation=0)
        elif v > 1000000:
            ax2.text(v + 80000, i,
                    "{:,}".format(v), 
                    color='black',
                    fontsize=minorFontSize - 2,
                    #fontweight='bold', 
                    horizontalalignment='left', 
                    verticalalignment='center',
                    rotation=0)

    d = .013 # how big to make the diagonal lines in axes coordinates
    # arguments to pass plot, just so we don't keep repeating them
    kwargs = dict(transform=ax.transAxes, color='lightgrey', clip_on=False, lw=1.0)
    ax.plot((1-d,1+d),(-d,+d), **kwargs) # top-left diagonal
    ax.plot((1-d,1+d),(1-d,1+d), **kwargs) # bottom-left diagonal

    kwargs.update(transform=ax2.transAxes) # switch to the bottom axes
    ax2.plot((-d,d),(-d,+d), **kwargs) # top-right diagonal
    ax2.plot((-d,d),(1-d,1+d), **kwargs) # bottom-right diagonal
    


    
    for item in ([ax.title, ax.xaxis.label, ax.yaxis.label] + ax.get_xticklabels() + ax.get_yticklabels()):
        item.set_fontsize(minorFontSize)
    
    for item in ([ax2.title, ax2.xaxis.label, ax2.yaxis.label] + ax2.get_xticklabels() + ax2.get_yticklabels()):
        item.set_fontsize(minorFontSize)
    
    ax.yaxis.get_offset_text().set_size(minorFontSize-2)
    ax2.yaxis.get_offset_text().set_size(minorFontSize-2)
    
    fig.savefig('../fig2/SupplementalFigure6%s.pdf' % letters[e], format="pdf",bbox_inches='tight')
    plt.show()
    
#     if animal == 'Atig_122':
#         fig.savefig('../fig2/Figure3C.pdf', format="pdf",bbox_inches='tight')
#     elif animal == 'A_tigris8450':
#         fig.savefig('../fig2/Figure3D.pdf', format="pdf",bbox_inches='tight')

Distribution of Genome Average Even Heterozygous Sites

In [147]:
covValuesDict
Out[147]:
{'A.tig_12512': 20,
 'A.tig_12513': 20,
 'A.tig_9721': 18,
 'A_tigris8450': 18,
 'Atig001': 16,
 'Atig003': 20,
 'Atig_122': 20,
 'Atig_4278': 20,
 'Atig_6993': 18,
 'Atig_9177': 22}
In [214]:
%%bash

# awk '$7==16' ../data/pysam/Atig001.merged.dedup.realigned.prof.not_hom > ../data/pysam/Atig001.merged.dedup.realigned.prof.avg_cov &
# awk '$7==20' ../data/pysam/Atig003.merged.dedup.realigned.prof.not_hom > ../data/pysam/Atig003.merged.dedup.realigned.prof.avg_cov &
# awk '$7==20' ../data/pysam/Atig_122.merged.dedup.realigned.prof.not_hom > ../data/pysam/Atig_122.merged.dedup.realigned.prof.avg_cov &
# awk '$7==20' ../data/pysam/A.tig_12512.merged.dedup.realigned.prof.not_hom > ../data/pysam/A.tig_12512.merged.dedup.realigned.prof.avg_cov &
# awk '$7==20' ../data/pysam/A.tig_12513.merged.dedup.realigned.prof.not_hom > ../data/pysam/A.tig_12513.merged.dedup.realigned.prof.avg_cov &
# awk '$7==20' ../data/pysam/Atig_4278.merged.dedup.realigned.prof.not_hom > ../data/pysam/Atig_4278.merged.dedup.realigned.prof.avg_cov &
# awk '$7==18' ../data/pysam/Atig_6993.merged.dedup.realigned.prof.not_hom > ../data/pysam/Atig_6993.merged.dedup.realigned.prof.avg_cov &
# awk '$7==22' ../data/pysam/Atig_9177.merged.dedup.realigned.prof.not_hom > ../data/pysam/Atig_9177.merged.dedup.realigned.prof.avg_cov &
# awk '$7==18' ../data/pysam/A.tig_9721.merged.dedup.realigned.prof.not_hom > ../data/pysam/A.tig_9721.merged.dedup.realigned.prof.avg_cov &
# awk '$7==18' ../data/pysam/A_tigris8450.merged.dedup.realigned.prof.not_hom > ../data/pysam/A_tigris8450.merged.dedup.realigned.prof.avg_cov &
In [223]:
%%bash
head ../data/pysam/Atig001.merged.dedup.realigned.prof.not_hom
head ../data/pysam/Atig001.merged.dedup.realigned.prof.avg_cov
chrom	pos	A	C	G	T	cov	status
Scpiz6a_49	6	0	49	7	37	93	Unknown
Scpiz6a_49	10	0	86	3	7	96	Unknown
Scpiz6a_49	11	1	47	51	0	99	Unknown
Scpiz6a_49	12	0	3	0	115	118	Transition
Scpiz6a_49	16	3	0	131	0	134	Transition
Scpiz6a_49	17	0	54	80	0	134	Transversion
Scpiz6a_49	18	6	8	0	146	160	Unknown
Scpiz6a_49	23	4	0	177	0	181	Transition
Scpiz6a_49	24	0	176	0	6	182	Transition
Scpiz6a_49	557	10	0	6	0	16	Transition
Scpiz6a_49	2067	0	0	14	2	16	Transversion
Scpiz6a_49	2875	15	0	1	0	16	Transition
Scpiz6a_49	3485	0	1	0	15	16	Transition
Scpiz6a_49	3488	0	1	15	0	16	Transversion
Scpiz6a_49	3489	0	0	1	15	16	Transversion
Scpiz6a_49	3491	0	0	15	1	16	Transversion
Scpiz6a_49	3521	15	0	0	1	16	Transversion
Scpiz6a_49	4234	1	0	15	0	16	Transition
Scpiz6a_49	4235	1	15	0	0	16	Transversion
In [321]:
df_list = []
for animal in animal_ids:
    df = pd.read_csv('../data/pysam/%s.merged.dedup.realigned.prof.avg_cov' % animal, sep='\t',names=['chrom','pos','A','C','G','T','cov','status'])
    df['animal'] = animal
    df_list.append(df)
    
avgCovHetSitesDF = pd.concat(df_list,axis=0)
avgCovHetSitesDF.head()
Out[321]:
chrom pos A C G T cov status animal
0 Scpiz6a_49 651 19 1 0 0 20 Transversion Atig003
1 Scpiz6a_49 2292 0 1 0 19 20 Transition Atig003
2 Scpiz6a_49 3000 19 0 0 1 20 Transversion Atig003
3 Scpiz6a_49 5047 0 1 0 19 20 Transition Atig003
4 Scpiz6a_49 5864 0 1 0 19 20 Transition Atig003
In [322]:
avgCovHetSitesDF['allele_count'] = apply_by_multiprocessing(avgCovHetSitesDF, allele_count,8)
avgCovHetSitesDF.head()
Out[322]:
chrom pos A C G T cov status animal allele_count
0 Scpiz6a_49 651 19 1 0 0 20 Transversion Atig003 (1, 19)
1 Scpiz6a_49 2292 0 1 0 19 20 Transition Atig003 (1, 19)
2 Scpiz6a_49 3000 19 0 0 1 20 Transversion Atig003 (1, 19)
3 Scpiz6a_49 5047 0 1 0 19 20 Transition Atig003 (1, 19)
4 Scpiz6a_49 5864 0 1 0 19 20 Transition Atig003 (1, 19)
In [323]:
avgCovHetSitesDF['equal'] = avgCovHetSitesDF['allele_count'].apply(lambda x: x[0] == x[1] and len(x)==2)
avgCovEqualHetSitesDF = avgCovHetSitesDF[avgCovHetSitesDF['equal']==True].reset_index(drop=True)
avgCovEqualHetSitesDF.head()
Out[323]:
chrom pos A C G T cov status animal allele_count equal
0 Scpiz6a_49 357652 0 10 0 10 20 Transition Atig003 (10, 10) True
1 Scpiz6a_49 533563 0 10 0 10 20 Transition Atig003 (10, 10) True
2 Scpiz6a_49 533564 10 10 0 0 20 Transversion Atig003 (10, 10) True
3 Scpiz6a_49 1076362 10 0 0 10 20 Transversion Atig003 (10, 10) True
4 Scpiz6a_49 1258626 10 0 10 0 20 Transition Atig003 (10, 10) True
In [324]:
scaffoldSizes['genome_scale'] = scaffoldSizes.scaffold_size.cumsum()
scaffoldSizes.genome_scale = scaffoldSizes.genome_scale.shift(1)
scaffoldSizes.fillna(value=0, inplace=True)
scaffoldScaleDict = dict(zip(scaffoldSizes.scaffold.tolist(), scaffoldSizes.genome_scale.tolist()))
display(scaffoldSizes.head())
scaffoldScaleDict
scaffold scaffold_size genome_scale
3825 Scpiz6a_49 85027298 0.0
3824 Scpiz6a_55.1 77407970 85027298.0
3823 Scpiz6a_26 75278123 162435268.0
3822 Scpiz6a_73 74911545 237713391.0
3821 Scpiz6a_37 69231668 312624936.0
Out[324]:
{'Scpiz6a_3775': 1639022565.0,
 'Scpiz6a_1710': 1639028019.0,
 'Scpiz6a_1711': 1636688900.0,
 'Scpiz6a_1712': 1636222427.0,
 'Scpiz6a_1713': 1639520853.0,
 'Scpiz6a_1714': 1639254835.0,
 'Scpiz6a_1715': 1638962284.0,
 'Scpiz6a_1716': 1638685159.0,
 'Scpiz6a_1717': 1637338467.0,
 'Scpiz6a_1718': 1638778443.0,
 'Scpiz6a_1719': 1637647215.0,
 'Scpiz6a_617': 1637685041.0,
 'Scpiz6a_1481': 1637831127.0,
 'Scpiz6a_615': 1637164824.0,
 'Scpiz6a_614': 1637186843.0,
 'Scpiz6a_613': 1634735097.0,
 'Scpiz6a_612': 1634710713.0,
 'Scpiz6a_1486': 1638634444.0,
 'Scpiz6a_610': 1630834129.0,
 'Scpiz6a_1488': 1638073634.0,
 'Scpiz6a_1489': 1639352546.0,
 'Scpiz6a_619': 1637639083.0,
 'Scpiz6a_618': 1634257302.0,
 'Scpiz6a_1650': 1639000713.0,
 'Scpiz6a_2061': 1639465599.0,
 'Scpiz6a_1654': 1638824618.0,
 'Scpiz6a_103': 1637491959.0,
 'Scpiz6a_236': 1638854824.0,
 'Scpiz6a_1628': 1637469657.0,
 'Scpiz6a_1629': 1634572756.0,
 'Scpiz6a_1158': 1635199980.0,
 'Scpiz6a_1159': 1634700916.0,
 'Scpiz6a_1624': 1635402501.0,
 'Scpiz6a_1157': 1633609584.0,
 'Scpiz6a_1154': 1638864854.0,
 'Scpiz6a_1155': 1636341192.0,
 'Scpiz6a_1152': 1634952594.0,
 'Scpiz6a_1153': 1638658707.0,
 'Scpiz6a_1150': 1634897027.0,
 'Scpiz6a_1623': 1635467812.0,
 'Scpiz6a_729': 1638872646.0,
 'Scpiz6a_2262': 1637256707.0,
 'Scpiz6a_1398': 1638815628.0,
 'Scpiz6a_1399': 1638836947.0,
 'Scpiz6a_1394': 1637765139.0,
 'Scpiz6a_722': 1637044331.0,
 'Scpiz6a_721': 1635975188.0,
 'Scpiz6a_720': 1636648737.0,
 'Scpiz6a_1390': 1639041076.0,
 'Scpiz6a_1391': 1633343192.0,
 'Scpiz6a_1392': 1635390132.0,
 'Scpiz6a_724': 1634104860.0,
 'Scpiz6a_105': 1593914005.0,
 'Scpiz6a_104': 1633720801.0,
 'Scpiz6a_1899': 1637713220.0,
 'Scpiz6a_1898': 1637742552.0,
 'Scpiz6a_266': 1639380275.0,
 'Scpiz6a_267': 1638411903.0,
 'Scpiz6a_264': 1490215262.0,
 'Scpiz6a_265': 1635914618.0,
 'Scpiz6a_262': 1639409878.0,
 'Scpiz6a_263': 1513297986.0,
 'Scpiz6a_260': 1637056350.0,
 'Scpiz6a_261': 1634998264.0,
 'Scpiz6a_2407': 1639312200.0,
 'Scpiz6a_1895': 1636962637.0,
 'Scpiz6a_268': 1635542199.0,
 'Scpiz6a_269': 1614479145.0,
 'Scpiz6a_460': 1634469674.0,
 'Scpiz6a_461': 1634518226.0,
 'Scpiz6a_462': 1583396600.0,
 'Scpiz6a_463': 1636564490.0,
 'Scpiz6a_464': 1634152996.0,
 'Scpiz6a_465': 1636004285.0,
 'Scpiz6a_466': 1633741570.0,
 'Scpiz6a_467': 1634759326.0,
 'Scpiz6a_468': 1636427463.0,
 'Scpiz6a_2498': 1638604312.0,
 'Scpiz6a_1020': 1639467615.0,
 'Scpiz6a_1021': 1637828497.0,
 'Scpiz6a_1026': 1638288477.0,
 'Scpiz6a_1027': 1636508830.0,
 'Scpiz6a_1024': 1637533584.0,
 'Scpiz6a_1892': 1637362653.0,
 'Scpiz6a_1891': 1639471644.0,
 'Scpiz6a_751': 1637334192.0,
 'Scpiz6a_900': 1639104929.0,
 'Scpiz6a_1480': 1639148920.0,
 'Scpiz6a_616': 1638054626.0,
 'Scpiz6a_1482': 1637927717.0,
 'Scpiz6a_1483': 1637926421.0,
 'Scpiz6a_1484': 1637859998.0,
 'Scpiz6a_907': 1637580439.0,
 'Scpiz6a_1485': 1635092543.0,
 'Scpiz6a_2618': 1638819001.0,
 'Scpiz6a_2619': 1636497297.0,
 'Scpiz6a_611': 1639415986.0,
 'Scpiz6a_3102': 1636040380.0,
 'Scpiz6a_2610': 1636983930.0,
 'Scpiz6a_2611': 1639194711.0,
 'Scpiz6a_2612': 1633577153.0,
 'Scpiz6a_97': 1639064949.0,
 'Scpiz6a_2614': 1639359753.0,
 'Scpiz6a_3101': 1638534173.0,
 'Scpiz6a_2616': 1635634980.0,
 'Scpiz6a_2617': 1637943245.0,
 'Scpiz6a_594': 1637782359.0,
 'Scpiz6a_595': 1637392426.0,
 'Scpiz6a_596': 1635573967.0,
 'Scpiz6a_597': 1620810945.0,
 'Scpiz6a_590': 1637302757.0,
 'Scpiz6a_591': 1636888975.0,
 'Scpiz6a_592': 1637543261.0,
 'Scpiz6a_593': 1457337151.0,
 'Scpiz6a_2980': 1639158534.0,
 'Scpiz6a_906': 1639066032.0,
 'Scpiz6a_598': 1638775056.0,
 'Scpiz6a_599': 1638516502.0,
 'Scpiz6a_3106': 1637886118.0,
 'Scpiz6a_2430': 1635662321.0,
 'Scpiz6a_2431': 1636986966.0,
 'Scpiz6a_2120': 1636307007.0,
 'Scpiz6a_2433': 1636054744.0,
 'Scpiz6a_2126': 1637607803.0,
 'Scpiz6a_3105': 1635530230.0,
 'Scpiz6a_2124': 1632900011.0,
 'Scpiz6a_2794': 1634783399.0,
 'Scpiz6a_2438': 1637775741.0,
 'Scpiz6a_2439': 1638301906.0,
 'Scpiz6a_2128': 1639296617.0,
 'Scpiz6a_2129': 1637030789.0,
 'Scpiz6a_2797': 1639525853.0,
 'Scpiz6a_296': 1639303895.0,
 'Scpiz6a_2796': 1635431185.0,
 'Scpiz6a_905': 1638661013.0,
 'Scpiz6a_2791': 1639487750.0,
 'Scpiz6a_2790': 1633882142.0,
 'Scpiz6a_2021': 1637718561.0,
 'Scpiz6a_2058': 1638702336.0,
 'Scpiz6a_2020': 1639292456.0,
 'Scpiz6a_2724': 1635897973.0,
 'Scpiz6a_289': 1623755943.0,
 'Scpiz6a_2726': 1638100123.0,
 'Scpiz6a_2055': 1635371536.0,
 'Scpiz6a_2052': 1636493995.0,
 'Scpiz6a_2721': 1636136613.0,
 'Scpiz6a_2722': 1635960573.0,
 'Scpiz6a_2723': 1636971773.0,
 'Scpiz6a_280': 1637794253.0,
 'Scpiz6a_281': 1636588884.0,
 'Scpiz6a_282': 1638656400.0,
 'Scpiz6a_283': 1634890020.0,
 'Scpiz6a_2728': 1634703368.0,
 'Scpiz6a_2729': 1635052399.0,
 'Scpiz6a_286': 1635134590.0,
 'Scpiz6a_287': 1637592769.0,
 'Scpiz6a_3231': 1637060849.0,
 'Scpiz6a_3230': 1637388182.0,
 'Scpiz6a_3233': 1635221531.0,
 'Scpiz6a_3232': 1635236566.0,
 'Scpiz6a_2548': 1636289834.0,
 'Scpiz6a_2549': 1635899825.0,
 'Scpiz6a_3237': 1638161429.0,
 'Scpiz6a_3236': 1632719930.0,
 'Scpiz6a_3239': 1634737525.0,
 'Scpiz6a_3238': 1637536350.0,
 'Scpiz6a_2546': 1636349713.0,
 'Scpiz6a_2547': 1638760351.0,
 'Scpiz6a_2540': 1633632827.0,
 'Scpiz6a_2541': 1634603406.0,
 'Scpiz6a_2542': 1634636321.0,
 'Scpiz6a_2543': 1635306763.0,
 'Scpiz6a_3054': 1638729710.0,
 'Scpiz6a_3235': 1638919235.0,
 'Scpiz6a_2640': 1639378228.0,
 'Scpiz6a_428.1': 1631638909.0,
 'Scpiz6a_3294': 1635121361.0,
 'Scpiz6a_132': 1631155467.0,
 'Scpiz6a_79': 1635296208.0,
 'Scpiz6a_130': 1639486744.0,
 'Scpiz6a_131': 1639469630.0,
 'Scpiz6a_136': 1570486390.0,
 'Scpiz6a_137': 1636832959.0,
 'Scpiz6a_134': 1637921234.0,
 'Scpiz6a_135': 1637755844.0,
 'Scpiz6a_70': 1637496132.0,
 'Scpiz6a_71': 1636120695.0,
 'Scpiz6a_72': 655146831.0,
 'Scpiz6a_73': 237713391.0,
 'Scpiz6a_74': 880550878.0,
 'Scpiz6a_75': 1145830824.0,
 'Scpiz6a_76': 1636603481.0,
 'Scpiz6a_77': 1637158940.0,
 'Scpiz6a_3529': 1638797613.0,
 'Scpiz6a_3528': 1637123475.0,
 'Scpiz6a_3400': 1637150100.0,
 'Scpiz6a_3525': 1637248036.0,
 'Scpiz6a_3524': 1638343235.0,
 'Scpiz6a_3527': 1637011187.0,
 'Scpiz6a_3526': 1639477686.0,
 'Scpiz6a_3521': 1636348011.0,
 'Scpiz6a_3520': 1637652633.0,
 'Scpiz6a_3523': 1638018986.0,
 'Scpiz6a_3522': 1638934715.0,
 'Scpiz6a_3401': 1636752665.0,
 'Scpiz6a_755': 1636622902.0,
 'Scpiz6a_1071': 1638077427.0,
 'Scpiz6a_3020': 1639462575.0,
 'Scpiz6a_832': 1638871534.0,
 'Scpiz6a_2144': 1639500817.0,
 'Scpiz6a_868': 1638983180.0,
 'Scpiz6a_869': 1637048840.0,
 'Scpiz6a_715': 1639207426.0,
 'Scpiz6a_864': 1635141184.0,
 'Scpiz6a_865': 1637292721.0,
 'Scpiz6a_866': 1636521981.0,
 'Scpiz6a_867': 1635367392.0,
 'Scpiz6a_860': 1635890556.0,
 'Scpiz6a_861': 1638825741.0,
 'Scpiz6a_862': 1636848581.0,
 'Scpiz6a_863': 1638827985.0,
 'Scpiz6a_3419': 1635781653.0,
 'Scpiz6a_3418': 1637259595.0,
 'Scpiz6a_3149': 1639251688.0,
 'Scpiz6a_3148': 1637816656.0,
 'Scpiz6a_3411': 1637220452.0,
 'Scpiz6a_3146': 1636526905.0,
 'Scpiz6a_3145': 1637278343.0,
 'Scpiz6a_3144': 1637412173.0,
 'Scpiz6a_3415': 1635357006.0,
 'Scpiz6a_3142': 1636505539.0,
 'Scpiz6a_3417': 1635184816.0,
 'Scpiz6a_3140': 1634875959.0,
 'Scpiz6a_2940': 1634705818.0,
 'Scpiz6a_2941': 1638812253.0,
 'Scpiz6a_2942': 1634562477.0,
 'Scpiz6a_2943': 1635603572.0,
 'Scpiz6a_2944': 1637429050.0,
 'Scpiz6a_2945': 1638591521.0,
 'Scpiz6a_2946': 1639428185.0,
 'Scpiz6a_2947': 1638153952.0,
 'Scpiz6a_2948': 1635145575.0,
 'Scpiz6a_2949': 1638265230.0,
 'Scpiz6a_1094': 1637850822.0,
 'Scpiz6a_3099': 1636709689.0,
 'Scpiz6a_3098': 1638560001.0,
 'Scpiz6a_3095': 1638567020.0,
 'Scpiz6a_3094': 1638901514.0,
 'Scpiz6a_3097': 1638010054.0,
 'Scpiz6a_3096': 1637407948.0,
 'Scpiz6a_3091': 1638472708.0,
 'Scpiz6a_3090': 1634892357.0,
 'Scpiz6a_3093': 1636192815.0,
 'Scpiz6a_3092': 1638881540.0,
 'Scpiz6a_1817': 1636523623.0,
 'Scpiz6a_1816': 1635437310.0,
 'Scpiz6a_1815': 1636056536.0,
 'Scpiz6a_1814': 1637405129.0,
 'Scpiz6a_1365': 1635534224.0,
 'Scpiz6a_1364': 1638540053.0,
 'Scpiz6a_1811': 1638608955.0,
 'Scpiz6a_1366': 1636238037.0,
 'Scpiz6a_1369': 1638755821.0,
 'Scpiz6a_1368': 1638322606.0,
 'Scpiz6a_1819': 1634666292.0,
 'Scpiz6a_1818': 1639279944.0,
 'Scpiz6a_910': 1633844828.0,
 'Scpiz6a_1566': 1636517050.0,
 'Scpiz6a_912': 1636182319.0,
 'Scpiz6a_913': 1637707868.0,
 'Scpiz6a_914': 1634975493.0,
 'Scpiz6a_1562': 1638234530.0,
 'Scpiz6a_916': 1636803183.0,
 'Scpiz6a_1560': 1634288832.0,
 'Scpiz6a_918': 1637138281.0,
 'Scpiz6a_919': 1636760597.0,
 'Scpiz6a_1626': 1639437312.0,
 'Scpiz6a_1569': 1639269496.0,
 'Scpiz6a_1568': 1637595505.0,
 'Scpiz6a_3705': 1639403764.0,
 'Scpiz6a_3704': 1636621284.0,
 'Scpiz6a_3707': 1637008166.0,
 'Scpiz6a_1627': 1639236975.0,
 'Scpiz6a_3701': 1636141904.0,
 'Scpiz6a_3700': 1638584532.0,
 'Scpiz6a_3703': 1635817607.0,
 'Scpiz6a_3702': 1636634217.0,
 'Scpiz6a_1620': 1639264265.0,
 'Scpiz6a_3709': 1638403516.0,
 'Scpiz6a_3708': 1637766466.0,
 'Scpiz6a_1621': 1639337063.0,
 'Scpiz6a_1523': 1639425137.0,
 'Scpiz6a_1969': 1637667506.0,
 'Scpiz6a_1968': 1634246374.0,
 'Scpiz6a_1622': 1637635015.0,
 'Scpiz6a_1522': 1639033466.0,
 'Scpiz6a_1963': 1639496800.0,
 'Scpiz6a_1962': 1639324642.0,
 'Scpiz6a_1961': 1638793105.0,
 'Scpiz6a_1960': 1634197357.0,
 'Scpiz6a_1967': 1635251545.0,
 'Scpiz6a_1918': 1636873478.0,
 'Scpiz6a_1965': 1639227492.0,
 'Scpiz6a_1964': 1635083654.0,
 'Scpiz6a_957': 1636826703.0,
 'Scpiz6a_1527': 1638944653.0,
 'Scpiz6a_1526': 1638593849.0,
 'Scpiz6a_1703': 1639152127.0,
 'Scpiz6a_1702': 1638857054.0,
 'Scpiz6a_1701': 1639502825.0,
 'Scpiz6a_1700': 1634887679.0,
 'Scpiz6a_1707': 1637356969.0,
 'Scpiz6a_1706': 1637053348.0,
 'Scpiz6a_1705': 1636725635.0,
 'Scpiz6a_1704': 1636270881.0,
 'Scpiz6a_1709': 1634598316.0,
 'Scpiz6a_1708': 1638003665.0,
 'Scpiz6a_604': 1636189318.0,
 'Scpiz6a_1492': 1632448361.0,
 'Scpiz6a_1491': 1637936780.0,
 'Scpiz6a_607': 1634161274.0,
 'Scpiz6a_1497': 1636155986.0,
 'Scpiz6a_601': 1638236992.0,
 'Scpiz6a_1495': 1637233539.0,
 'Scpiz6a_603': 1634821633.0,
 'Scpiz6a_2337': 1639172392.0,
 'Scpiz6a_1499': 1637359811.0,
 'Scpiz6a_1498': 1636318994.0,
 'Scpiz6a_608': 1634496852.0,
 'Scpiz6a_609': 1633800367.0,
 'Scpiz6a_723': 1638529465.0,
 'Scpiz6a_1395': 1637965169.0,
 'Scpiz6a_1396': 1637970315.0,
 'Scpiz6a_2243': 1636700104.0,
 'Scpiz6a_1397': 1637880904.0,
 'Scpiz6a_2242': 1639349454.0,
 'Scpiz6a_727': 1639274723.0,
 'Scpiz6a_2241': 1636129546.0,
 'Scpiz6a_726': 1636353115.0,
 'Scpiz6a_1149': 1638260329.0,
 'Scpiz6a_1148': 1634768976.0,
 'Scpiz6a_3315': 1637962593.0,
 'Scpiz6a_725': 1637721231.0,
 'Scpiz6a_1639': 1639068198.0,
 'Scpiz6a_3740': 1635225831.0,
 'Scpiz6a_1141': 1636609957.0,
 'Scpiz6a_1636': 1634351040.0,
 'Scpiz6a_1143': 1638448914.0,
 'Scpiz6a_1142': 1635813838.0,
 'Scpiz6a_1145': 1638865968.0,
 'Scpiz6a_3743': 1635675947.0,
 'Scpiz6a_1147': 1635223682.0,
 'Scpiz6a_1146': 1636770105.0,
 'Scpiz6a_738': 1638243142.0,
 'Scpiz6a_739': 1637059350.0,
 'Scpiz6a_3034': 1639243288.0,
 'Scpiz6a_3310': 1638133980.0,
 'Scpiz6a_1389': 1639275768.0,
 'Scpiz6a_1388': 1637448685.0,
 'Scpiz6a_730': 1636722449.0,
 'Scpiz6a_731': 1637375427.0,
 'Scpiz6a_1385': 1636712880.0,
 'Scpiz6a_733': 1636033177.0,
 'Scpiz6a_734': 1638102639.0,
 'Scpiz6a_735': 1636111825.0,
 'Scpiz6a_736': 1635506181.0,
 'Scpiz6a_1380': 1634397852.0,
 'Scpiz6a_3031': 1637299891.0,
 'Scpiz6a_3030': 1638439378.0,
 'Scpiz6a_2328': 1636997578.0,
 'Scpiz6a_3423': 1638232066.0,
 'Scpiz6a_35': 1637515580.0,
 'Scpiz6a_2898': 1639293497.0,
 'Scpiz6a_455': 1631374720.0,
 'Scpiz6a_454': 1636948912.0,
 'Scpiz6a_1037': 1635158691.0,
 'Scpiz6a_1036': 1632625465.0,
 'Scpiz6a_1031': 1637355548.0,
 'Scpiz6a_1030': 1635059116.0,
 'Scpiz6a_453': 1634165388.0,
 'Scpiz6a_452': 1638637915.0,
 'Scpiz6a_3427': 1637351282.0,
 'Scpiz6a_1039': 1637827182.0,
 'Scpiz6a_458': 1616314001.0,
 'Scpiz6a_2917': 1639382321.0,
 'Scpiz6a_3429': 1636467502.0,
 'Scpiz6a_3584': 1638672526.0,
 'Scpiz6a_2609': 1639160668.0,
 'Scpiz6a_2608': 1635611436.0,
 'Scpiz6a_2603': 1635463754.0,
 'Scpiz6a_2602': 1638145219.0,
 'Scpiz6a_2601': 1637590031.0,
 'Scpiz6a_2600': 1636630986.0,
 'Scpiz6a_2607': 1636793752.0,
 'Scpiz6a_2606': 1638690892.0,
 'Scpiz6a_2605': 1638128980.0,
 'Scpiz6a_2604': 1634927248.0,
 'Scpiz6a_2423': 1629438344.0,
 'Scpiz6a_2422': 1636462513.0,
 'Scpiz6a_2421': 1638299466.0,
 'Scpiz6a_2420': 1636293274.0,
 'Scpiz6a_2427': 1637419209.0,
 'Scpiz6a_2426': 1633793927.0,
 'Scpiz6a_2425': 1633083921.0,
 'Scpiz6a_2424': 1639077939.0,
 'Scpiz6a_2429': 1638229601.0,
 'Scpiz6a_2428': 1634475163.0,
 'Scpiz6a_1897': 1638767141.0,
 'Scpiz6a_2041': 1634626238.0,
 'Scpiz6a_2736': 1636839212.0,
 'Scpiz6a_2735': 1632552725.0,
 'Scpiz6a_2042': 1636168284.0,
 'Scpiz6a_2045': 1638270129.0,
 'Scpiz6a_2732': 1635858909.0,
 'Scpiz6a_2731': 1637668857.0,
 'Scpiz6a_2046': 1635681780.0,
 'Scpiz6a_2049': 1638913700.0,
 'Scpiz6a_2048': 1637210246.0,
 'Scpiz6a_2739': 1637378263.0,
 'Scpiz6a_2738': 1636205032.0,
 'Scpiz6a_3045': 1638975490.0,
 'Scpiz6a_2559': 1635847671.0,
 'Scpiz6a_2558': 1635125778.0,
 'Scpiz6a_2557': 1634541746.0,
 'Scpiz6a_2556': 1637365493.0,
 'Scpiz6a_2555': 1633912899.0,
 'Scpiz6a_2554': 1634648849.0,
 'Scpiz6a_2553': 1638709190.0,
 'Scpiz6a_2552': 1639085506.0,
 'Scpiz6a_2551': 1635989764.0,
 'Scpiz6a_2550': 1635964228.0,
 'Scpiz6a_2148': 1638087529.0,
 'Scpiz6a_93': 1635136790.0,
 'Scpiz6a_90': 1327567734.0,
 'Scpiz6a_147': 912569336.0,
 'Scpiz6a_146': 449721910.0,
 'Scpiz6a_145': 1636881232.0,
 'Scpiz6a_144': 1634993721.0,
 'Scpiz6a_2139': 1635838286.0,
 'Scpiz6a_142': 1634641345.0,
 'Scpiz6a_141': 1632082201.0,
 'Scpiz6a_68': 1606102572.0,
 'Scpiz6a_2135': 1635821371.0,
 'Scpiz6a_66': 1355282223.0,
 'Scpiz6a_65': 561993337.0,
 'Scpiz6a_2136': 1633052192.0,
 'Scpiz6a_2131': 1635966055.0,
 'Scpiz6a_62': 1634894693.0,
 'Scpiz6a_149': 1636561231.0,
 'Scpiz6a_60': 1538020002.0,
 'Scpiz6a_3558': 1636970251.0,
 'Scpiz6a_3559': 1639013831.0,
 'Scpiz6a_94': 1636580764.0,
 'Scpiz6a_3550': 1638414297.0,
 'Scpiz6a_3551': 1636944328.0,
 'Scpiz6a_3552': 1634565048.0,
 'Scpiz6a_153': 1637952282.0,
 'Scpiz6a_3554': 1635034448.0,
 'Scpiz6a_1028': 1635774058.0,
 'Scpiz6a_3556': 1638806624.0,
 'Scpiz6a_3557': 1638517682.0,
 'Scpiz6a_2140': 1638954575.0,
 'Scpiz6a_1029': 1636811034.0,
 'Scpiz6a_3398': 1634847732.0,
 'Scpiz6a_3399': 1637430455.0,
 'Scpiz6a_3396': 1637689076.0,
 'Scpiz6a_3397': 1635644762.0,
 'Scpiz6a_3394': 1635392195.0,
 'Scpiz6a_3395': 1638736532.0,
 'Scpiz6a_3392': 1636592130.0,
 'Scpiz6a_98': 1637236443.0,
 'Scpiz6a_3390': 1637528049.0,
 'Scpiz6a_3391': 1637105646.0,
 'Scpiz6a_1985': 1637317061.0,
 'Scpiz6a_99': 1590008057.0,
 'Scpiz6a_1984': 1636325831.0,
 'Scpiz6a_2412': 1638053357.0,
 'Scpiz6a_1987': 1638297026.0,
 'Scpiz6a_2145': 1638649476.0,
 'Scpiz6a_1022': 1636439177.0,
 'Scpiz6a_1454': 1638749020.0,
 'Scpiz6a_2410': 1636614812.0,
 'Scpiz6a_1023': 1638712616.0,
 'Scpiz6a_1981': 1639140365.0,
 'Scpiz6a_2147': 1637396664.0,
 'Scpiz6a_1452': 1637811392.0,
 'Scpiz6a_851': 1636530183.0,
 'Scpiz6a_850': 1637821920.0,
 'Scpiz6a_853': 1638262780.0,
 'Scpiz6a_852': 1634908676.0,
 'Scpiz6a_855': 1637630945.0,
 'Scpiz6a_854': 1633455330.0,
 'Scpiz6a_857': 1638249287.0,
 'Scpiz6a_856': 1635871966.0,
 'Scpiz6a_859': 1632881707.0,
 'Scpiz6a_858': 1634086614.0,
 'Scpiz6a_1450': 1637457082.0,
 'Scpiz6a_3178': 1636022355.0,
 'Scpiz6a_3179': 1637459879.0,
 'Scpiz6a_3172': 1638613595.0,
 'Scpiz6a_3173': 1639055189.0,
 'Scpiz6a_3170': 1635461725.0,
 'Scpiz6a_3171': 1636400548.0,
 'Scpiz6a_3176': 1639490768.0,
 'Scpiz6a_3177': 1639030199.0,
 'Scpiz6a_3174': 1639329821.0,
 'Scpiz6a_3175': 1635987947.0,
 'Scpiz6a_2953': 1639304934.0,
 'Scpiz6a_2952': 1637783681.0,
 'Scpiz6a_2951': 1634502231.0,
 'Scpiz6a_2950': 1638469142.0,
 'Scpiz6a_2957': 1635193491.0,
 'Scpiz6a_2956': 1638743348.0,
 'Scpiz6a_2955': 1639042163.0,
 'Scpiz6a_2954': 1634854809.0,
 'Scpiz6a_2959': 1635693422.0,
 'Scpiz6a_2958': 1634871267.0,
 'Scpiz6a_1822': 1637550166.0,
 'Scpiz6a_1823': 1638460819.0,
 'Scpiz6a_1820': 1635268609.0,
 'Scpiz6a_1821': 1634438845.0,
 'Scpiz6a_1826': 1636503892.0,
 'Scpiz6a_1827': 1635654525.0,
 'Scpiz6a_1824': 1635494112.0,
 'Scpiz6a_1825': 1633713795.0,
 'Scpiz6a_1828': 1638426251.0,
 'Scpiz6a_1829': 1639215882.0,
 'Scpiz6a_92': 1638528287.0,
 'Scpiz6a_3546': 1639163868.0,
 'Scpiz6a_1552': 1634843002.0,
 'Scpiz6a_924': 1636825138.0,
 'Scpiz6a_927': 1632696671.0,
 'Scpiz6a_926': 1636336074.0,
 'Scpiz6a_921': 1637958728.0,
 'Scpiz6a_1557': 1633171268.0,
 'Scpiz6a_1554': 1636500596.0,
 'Scpiz6a_922': 1638230834.0,
 'Scpiz6a_886': 1638034278.0,
 'Scpiz6a_1558': 1637698487.0,
 'Scpiz6a_1559': 1639407840.0,
 'Scpiz6a_929': 1634950296.0,
 'Scpiz6a_928': 1636016940.0,
 'Scpiz6a_3630': 1638525931.0,
 'Scpiz6a_3770': 1637504472.0,
 'Scpiz6a_3771': 1637989574.0,
 'Scpiz6a_3772': 1636814172.0,
 'Scpiz6a_3773': 1634913330.0,
 'Scpiz6a_3774': 1633894483.0,
 'Scpiz6a_3633': 1638600827.0,
 'Scpiz6a_3776': 1639225382.0,
 'Scpiz6a_61': 1637565329.0,
 'Scpiz6a_3778': 1637840322.0,
 'Scpiz6a_3779': 1639385390.0,
 'Scpiz6a_885': 1638142723.0,
 'Scpiz6a_1956': 1639285163.0,
 'Scpiz6a_1957': 1637042827.0,
 'Scpiz6a_1954': 1635866374.0,
 'Scpiz6a_1955': 1635217228.0,
 'Scpiz6a_1952': 1633535263.0,
 'Scpiz6a_1953': 1639288291.0,
 'Scpiz6a_1950': 1637833755.0,
 'Scpiz6a_1951': 1635747392.0,
 'Scpiz6a_1958': 1636559601.0,
 'Scpiz6a_1959': 1635595693.0,
 'Scpiz6a_2132': 1639059529.0,
 'Scpiz6a_1738': 1636747902.0,
 'Scpiz6a_1739': 1638575199.0,
 'Scpiz6a_1736': 1634938791.0,
 'Scpiz6a_1737': 1635416874.0,
 'Scpiz6a_1734': 1637018733.0,
 'Scpiz6a_1735': 1634520852.0,
 'Scpiz6a_1732': 1639424121.0,
 'Scpiz6a_1733': 1638682865.0,
 'Scpiz6a_1730': 1634986894.0,
 'Scpiz6a_1731': 1639038902.0,
 'Scpiz6a_639': 1635997031.0,
 'Scpiz6a_638': 1634406797.0,
 'Scpiz6a_3047': 1638408310.0,
 'Scpiz6a_631': 1634208904.0,
 'Scpiz6a_630': 1634600862.0,
 'Scpiz6a_633': 1635984308.0,
 'Scpiz6a_632': 1636704899.0,
 'Scpiz6a_635': 1638737669.0,
 'Scpiz6a_634': 1634127117.0,
 'Scpiz6a_637': 1635804393.0,
 'Scpiz6a_636': 1636477456.0,
 'Scpiz6a_2899': 1637116056.0,
 'Scpiz6a_177.1': 1631079089.0,
 'Scpiz6a_3609': 1637579067.0,
 'Scpiz6a_1924': 1638366208.0,
 'Scpiz6a_3604': 1638158938.0,
 'Scpiz6a_3605': 1639298697.0,
 'Scpiz6a_3606': 1638639072.0,
 'Scpiz6a_3607': 1633960896.0,
 'Scpiz6a_3600': 1638845894.0,
 'Scpiz6a_3601': 1637192703.0,
 'Scpiz6a_3602': 1636847020.0,
 'Scpiz6a_3603': 1636086913.0,
 'Scpiz6a_1449': 1639025838.0,
 'Scpiz6a_3386': 1637993420.0,
 'Scpiz6a_2624': 1638228368.0,
 'Scpiz6a_1608': 1634980059.0,
 'Scpiz6a_1609': 1635469841.0,
 'Scpiz6a_2419': 1638718322.0,
 'Scpiz6a_1602': 1638132730.0,
 'Scpiz6a_1603': 1635666216.0,
 'Scpiz6a_1600': 1635398382.0,
 'Scpiz6a_1601': 1638213546.0,
 'Scpiz6a_1606': 1638334753.0,
 'Scpiz6a_1607': 1637468261.0,
 'Scpiz6a_1604': 1635703108.0,
 'Scpiz6a_1605': 1637189773.0,
 'Scpiz6a_709': 1634177644.0,
 'Scpiz6a_708': 1637739891.0,
 'Scpiz6a_705': 1638290921.0,
 'Scpiz6a_704': 1636201546.0,
 'Scpiz6a_707': 1632263361.0,
 'Scpiz6a_706': 1639179841.0,
 'Scpiz6a_701': 1639191528.0,
 'Scpiz6a_700': 1632935983.0,
 'Scpiz6a_703': 1636804754.0,
 'Scpiz6a_702': 1636899788.0,
 'Scpiz6a_423.1': 1613339411.0,
 'Scpiz6a_448': 1636666432.0,
 'Scpiz6a_449': 1633285286.0,
 'Scpiz6a_442': 1631907437.0,
 'Scpiz6a_443': 1635047918.0,
 'Scpiz6a_440': 1636660002.0,
 'Scpiz6a_441': 1638416691.0,
 'Scpiz6a_446': 1638405914.0,
 'Scpiz6a_447': 1621891978.0,
 'Scpiz6a_444': 1636796898.0,
 'Scpiz6a_445': 1635962401.0,
 'Scpiz6a_1040': 1638227134.0,
 'Scpiz6a_55.2': 1573213224.0,
 'Scpiz6a_55.1': 85027298.0,
 'Scpiz6a_1043': 1639365921.0,
 'Scpiz6a_1044': 1637133841.0,
 'Scpiz6a_1045': 1636232838.0,
 'Scpiz6a_1046': 1638196213.0,
 'Scpiz6a_1047': 1636915195.0,
 'Scpiz6a_1048': 1635206455.0,
 'Scpiz6a_1049': 1631752598.0,
 'Scpiz6a_253.1': 1634017894.0,
 'Scpiz6a_2751': 1638438185.0,
 'Scpiz6a_2100': 1638531819.0,
 'Scpiz6a_2750': 1634628761.0,
 'Scpiz6a_2753': 1639214826.0,
 'Scpiz6a_2752': 1637113084.0,
 'Scpiz6a_1421': 1639508841.0,
 'Scpiz6a_1156': 1637111597.0,
 'Scpiz6a_217': 1638833589.0,
 'Scpiz6a_2754': 1636896702.0,
 'Scpiz6a_2636': 1637062348.0,
 'Scpiz6a_2637': 1638377063.0,
 'Scpiz6a_2634': 1637519740.0,
 'Scpiz6a_2635': 1635627142.0,
 'Scpiz6a_2632': 1635304657.0,
 'Scpiz6a_2633': 1636378571.0,
 'Scpiz6a_2630': 1638246829.0,
 'Scpiz6a_2631': 1638346867.0,
 'Scpiz6a_2756': 1635232275.0,
 'Scpiz6a_2638': 1637572203.0,
 'Scpiz6a_2639': 1638536525.0,
 'Scpiz6a_2613': 1639360781.0,
 'Scpiz6a_2388': 1638331113.0,
 'Scpiz6a_2389': 1639361809.0,
 'Scpiz6a_2386': 1637337042.0,
 'Scpiz6a_2387': 1636672857.0,
 'Scpiz6a_2384': 1636104717.0,
 'Scpiz6a_2385': 1635980663.0,
 'Scpiz6a_2382': 1634427393.0,
 'Scpiz6a_2383': 1638293364.0,
 'Scpiz6a_2380': 1636738367.0,
 'Scpiz6a_2381': 1636606719.0,
 'Scpiz6a_435.1': 1629598643.0,
 'Scpiz6a_60.1': 1632741856.0,
 'Scpiz6a_2702': 1636412339.0,
 'Scpiz6a_2703': 1637389597.0,
 'Scpiz6a_2700': 1638224666.0,
 'Scpiz6a_2701': 1638686306.0,
 'Scpiz6a_2706': 1637651279.0,
 'Scpiz6a_2707': 1637102668.0,
 'Scpiz6a_2704': 1635971536.0,
 'Scpiz6a_2705': 1638136478.0,
 'Scpiz6a_2708': 1638204885.0,
 'Scpiz6a_2709': 1635794927.0,
 'Scpiz6a_2568': 1638585697.0,
 'Scpiz6a_2569': 1637274022.0,
 'Scpiz6a_2358': 1639485738.0,
 'Scpiz6a_2562': 1636890521.0,
 'Scpiz6a_2563': 1637625507.0,
 'Scpiz6a_2560': 1634447331.0,
 'Scpiz6a_2561': 1637410765.0,
 'Scpiz6a_2566': 1638113948.0,
 'Scpiz6a_2567': 1637763812.0,
 'Scpiz6a_2564': 1635768358.0,
 'Scpiz6a_2565': 1636798470.0,
 'Scpiz6a_2098': 1635599634.0,
 'Scpiz6a_2099': 1638842540.0,
 'Scpiz6a_2092': 1639521853.0,
 'Scpiz6a_2093': 1638450105.0,
 'Scpiz6a_2090': 1639331891.0,
 'Scpiz6a_3041': 1638250515.0,
 'Scpiz6a_2096': 1637476636.0,
 'Scpiz6a_2097': 1635570008.0,
 'Scpiz6a_2094': 1638986471.0,
 'Scpiz6a_2095': 1637540498.0,
 'Scpiz6a_154': 1635739730.0,
 'Scpiz6a_2149': 1635548173.0,
 'Scpiz6a_156': 1634744805.0,
 'Scpiz6a_91': 1636502245.0,
 'Scpiz6a_150': 1474288736.0,
 'Scpiz6a_151': 1638503518.0,
 'Scpiz6a_152': 1634866573.0,
 'Scpiz6a_95': 1635674001.0,
 'Scpiz6a_2416': 1638727434.0,
 'Scpiz6a_2141': 1635038941.0,
 'Scpiz6a_2142': 1636484075.0,
 'Scpiz6a_2143': 1635105862.0,
 'Scpiz6a_158': 1634376358.0,
 'Scpiz6a_159': 1636525264.0,
 'Scpiz6a_2146': 1634673750.0,
 'Scpiz6a_2411': 1636255341.0,
 'Scpiz6a_2122': 1638829106.0,
 'Scpiz6a_3549': 1635056879.0,
 'Scpiz6a_3548': 1636402235.0,
 'Scpiz6a_3163': 1637406539.0,
 'Scpiz6a_2123': 1639411914.0,
 'Scpiz6a_3543': 1637437475.0,
 'Scpiz6a_3542': 1639164934.0,
 'Scpiz6a_3541': 1638939134.0,
 'Scpiz6a_3540': 1635352840.0,
 'Scpiz6a_3547': 1635877552.0,
 'Scpiz6a_2432': 1637626867.0,
 'Scpiz6a_1928': 1637636371.0,
 'Scpiz6a_3544': 1638001106.0,
 'Scpiz6a_3389': 1639282032.0,
 'Scpiz6a_3388': 1638925871.0,
 'Scpiz6a_2890': 1639442373.0,
 'Scpiz6a_2891': 1636693705.0,
 'Scpiz6a_2896': 1639185156.0,
 'Scpiz6a_2897': 1638670226.0,
 'Scpiz6a_2894': 1634792997.0,
 'Scpiz6a_2895': 1635334055.0,
 'Scpiz6a_3381': 1634816872.0,
 'Scpiz6a_3380': 1634515579.0,
 'Scpiz6a_3383': 1635065821.0,
 'Scpiz6a_3382': 1612164559.0,
 'Scpiz6a_3385': 1636060117.0,
 'Scpiz6a_3384': 1635212922.0,
 'Scpiz6a_3387': 1634984616.0,
 'Scpiz6a_2127': 1637074324.0,
 'Scpiz6a_2436': 1636447523.0,
 'Scpiz6a_2962': 1637908243.0,
 'Scpiz6a_1151': 1636276055.0,
 'Scpiz6a_2125': 1637758500.0,
 'Scpiz6a_3168': 1636442516.0,
 'Scpiz6a_846': 1637480819.0,
 'Scpiz6a_847': 1636720855.0,
 'Scpiz6a_844': 1637904343.0,
 'Scpiz6a_845': 1637458481.0,
 'Scpiz6a_842': 1637114570.0,
 'Scpiz6a_843': 1636495646.0,
 'Scpiz6a_840': 1632823512.0,
 'Scpiz6a_841': 1636118923.0,
 'Scpiz6a_556': 1639348423.0,
 'Scpiz6a_366': 1638688600.0,
 'Scpiz6a_848': 1636510474.0,
 'Scpiz6a_849': 1635546183.0,
 'Scpiz6a_3165': 1636909039.0,
 'Scpiz6a_3164': 1637622786.0,
 'Scpiz6a_3167': 1638762615.0,
 'Scpiz6a_3166': 1636719260.0,
 'Scpiz6a_3161': 1638698906.0,
 'Scpiz6a_3160': 1636094042.0,
 'Scpiz6a_2968': 1634415699.0,
 'Scpiz6a_3162': 1637949702.0,
 'Scpiz6a_2966': 1634409765.0,
 'Scpiz6a_2967': 1635811951.0,
 'Scpiz6a_2964': 1634510256.0,
 'Scpiz6a_2965': 1634845368.0,
 'Scpiz6a_3169': 1634623712.0,
 'Scpiz6a_2963': 1635245131.0,
 'Scpiz6a_2960': 1635475921.0,
 'Scpiz6a_2961': 1638275025.0,
 'Scpiz6a_3414': 1638188774.0,
 'Scpiz6a_2808': 1639012738.0,
 'Scpiz6a_938': 1636420747.0,
 'Scpiz6a_939': 1635027692.0,
 'Scpiz6a_1549': 1637354126.0,
 'Scpiz6a_1548': 1634109346.0,
 'Scpiz6a_1545': 1634354229.0,
 'Scpiz6a_933': 1635329873.0,
 'Scpiz6a_1547': 1638509422.0,
 'Scpiz6a_1546': 1634646349.0,
 'Scpiz6a_1541': 1633445801.0,
 'Scpiz6a_937': 1634658824.0,
 'Scpiz6a_934': 1637778389.0,
 'Scpiz6a_935': 1637216082.0,
 'Scpiz6a_2414': 1637493350.0,
 'Scpiz6a_3763': 1638326254.0,
 'Scpiz6a_3762': 1637705190.0,
 'Scpiz6a_3761': 1637832441.0,
 'Scpiz6a_3760': 1635554142.0,
 'Scpiz6a_3767': 1637687731.0,
 'Scpiz6a_3766': 1635373605.0,
 'Scpiz6a_3765': 1635331964.0,
 'Scpiz6a_3764': 1638337179.0,
 'Scpiz6a_3769': 1636097602.0,
 'Scpiz6a_3768': 1637224818.0,
 'Scpiz6a_1207': 1637466865.0,
 'Scpiz6a_1941': 1637757172.0,
 'Scpiz6a_1940': 1638489323.0,
 'Scpiz6a_1943': 1639483726.0,
 'Scpiz6a_1942': 1635490077.0,
 'Scpiz6a_1945': 1638106411.0,
 'Scpiz6a_1944': 1637461277.0,
 'Scpiz6a_1947': 1637332766.0,
 'Scpiz6a_1946': 1636647126.0,
 'Scpiz6a_1949': 1636052952.0,
 'Scpiz6a_1948': 1639102773.0,
 'Scpiz6a_1729': 1638892642.0,
 'Scpiz6a_1728': 1634826391.0,
 'Scpiz6a_1721': 1633781034.0,
 'Scpiz6a_1720': 1638220961.0,
 'Scpiz6a_1723': 1634461344.0,
 'Scpiz6a_1722': 1633787484.0,
 'Scpiz6a_1725': 1636727228.0,
 'Scpiz6a_1724': 1635982486.0,
 'Scpiz6a_1727': 1637848198.0,
 'Scpiz6a_1726': 1637849510.0,
 'Scpiz6a_628': 1639021474.0,
 'Scpiz6a_629': 1638905950.0,
 'Scpiz6a_626': 1637857378.0,
 'Scpiz6a_627': 1630211994.0,
 'Scpiz6a_624': 1637914743.0,
 'Scpiz6a_625': 1636117150.0,
 'Scpiz6a_622': 1639238028.0,
 'Scpiz6a_623': 1637084776.0,
 'Scpiz6a_620': 1636191067.0,
 'Scpiz6a_621': 1639076857.0,
 'Scpiz6a_1079': 1638630973.0,
 'Scpiz6a_3619': 1638068570.0,
 'Scpiz6a_3618': 1637664804.0,
 'Scpiz6a_3617': 1638840304.0,
 'Scpiz6a_3616': 1635132388.0,
 'Scpiz6a_3615': 1639404783.0,
 'Scpiz6a_3614': 1637624147.0,
 'Scpiz6a_3613': 1636330953.0,
 'Scpiz6a_3612': 1635336145.0,
 'Scpiz6a_3611': 1639231709.0,
 'Scpiz6a_3610': 1637386767.0,
 'Scpiz6a_1839': 1639081182.0,
 'Scpiz6a_1838': 1637632303.0,
 'Scpiz6a_1835': 1638830227.0,
 'Scpiz6a_1834': 1638867082.0,
 'Scpiz6a_1837': 1635936726.0,
 'Scpiz6a_1836': 1635735889.0,
 'Scpiz6a_1831': 1636961114.0,
 'Scpiz6a_1830': 1635942242.0,
 'Scpiz6a_1833': 1638868195.0,
 'Scpiz6a_1832': 1635591747.0,
 'Scpiz6a_1934': 1637239347.0,
 'Scpiz6a_1427': 1638980984.0,
 'Scpiz6a_1784': 1638628658.0,
 'Scpiz6a_1615': 1634424509.0,
 'Scpiz6a_1614': 1636732004.0,
 'Scpiz6a_1617': 1639370029.0,
 'Scpiz6a_1616': 1635396320.0,
 'Scpiz6a_1611': 1635701172.0,
 'Scpiz6a_1610': 1638931400.0,
 'Scpiz6a_1613': 1635843919.0,
 'Scpiz6a_1612': 1636450858.0,
 'Scpiz6a_1930': 1638317740.0,
 'Scpiz6a_1619': 1635310975.0,
 'Scpiz6a_1618': 1634068026.0,
 'Scpiz6a_1423': 1636459185.0,
 'Scpiz6a_712': 1637395252.0,
 'Scpiz6a_288': 1634756910.0,
 'Scpiz6a_710': 1636490689.0,
 'Scpiz6a_711': 1636298429.0,
 'Scpiz6a_716': 1636701703.0,
 'Scpiz6a_717': 1635182644.0,
 'Scpiz6a_714': 1638352917.0,
 'Scpiz6a_2725': 1635730122.0,
 'Scpiz6a_1387': 1639071447.0,
 'Scpiz6a_718': 1637286974.0,
 'Scpiz6a_1933': 1637006655.0,
 'Scpiz6a_2054': 1639234869.0,
 'Scpiz6a_932': 1638082480.0,
 'Scpiz6a_324': 1638398716.0,
 'Scpiz6a_3706': 1639478693.0,
 'Scpiz6a_1544': 1638191254.0,
 'Scpiz6a_2359': 1638728572.0,
 'Scpiz6a_825': 1639034554.0,
 'Scpiz6a_930': 1638968890.0,
 'Scpiz6a_2053': 1636244964.0,
 'Scpiz6a_931': 1638852593.0,
 'Scpiz6a_2050': 1637023255.0,
 'Scpiz6a_1171': 1638240683.0,
 'Scpiz6a_1656': 1636515406.0,
 'Scpiz6a_936': 1637641795.0,
 'Scpiz6a_2051': 1639155331.0,
 'Scpiz6a_1540': 1635154323.0,
 'Scpiz6a_1543': 1637963881.0,
 'Scpiz6a_439': 1639344295.0,
 'Scpiz6a_438': 1639480707.0,
 'Scpiz6a_437': 1635787345.0,
 'Scpiz6a_436': 1638622868.0,
 'Scpiz6a_435': 1627809148.0,
 'Scpiz6a_434': 1597630794.0,
 'Scpiz6a_433': 1636294993.0,
 'Scpiz6a_432': 1639011645.0,
 'Scpiz6a_431': 1636465839.0,
 'Scpiz6a_430': 1638751288.0,
 'Scpiz6a_1053': 1637551546.0,
 'Scpiz6a_3684': 1635388068.0,
 'Scpiz6a_1051': 1636862600.0,
 'Scpiz6a_1050': 1637399486.0,
 'Scpiz6a_1057': 1634931871.0,
 'Scpiz6a_1056': 1636800041.0,
 'Scpiz6a_1055': 1638684012.0,
 'Scpiz6a_1054': 1636749490.0,
 'Scpiz6a_284': 1637644505.0,
 'Scpiz6a_1059': 1635697298.0,
 'Scpiz6a_1058': 1638562342.0,
 'Scpiz6a_3686': 1638264005.0,
 'Scpiz6a_285': 1637799532.0,
 'Scpiz6a_3687': 1639460557.0,
 'Scpiz6a_2353': 1637471053.0,
 'Scpiz6a_1371': 1639441361.0,
 'Scpiz6a_364': 1637771768.0,
 'Scpiz6a_3680': 1639096305.0,
 'Scpiz6a_2352': 1637465469.0,
 'Scpiz6a_1890': 1639308050.0,
 'Scpiz6a_3681': 1638782956.0,
 'Scpiz6a_2225': 1637923828.0,
 'Scpiz6a_3682': 1636696906.0,
 'Scpiz6a_3339': 1638811128.0,
 'Scpiz6a_834': 1638016434.0,
 'Scpiz6a_381': 1639479700.0,
 'Scpiz6a_380': 1634668780.0,
 'Scpiz6a_383': 1632042295.0,
 'Scpiz6a_382': 1561455177.0,
 'Scpiz6a_385': 1639131793.0,
 'Scpiz6a_384': 1634577889.0,
 'Scpiz6a_387': 1637865236.0,
 'Scpiz6a_386': 1638345657.0,
 'Scpiz6a_389': 1638379473.0,
 'Scpiz6a_388': 1637637727.0,
 'Scpiz6a_1052': 1638844776.0,
 'Scpiz6a_543': 1638166408.0,
 'Scpiz6a_2620': 1635873830.0,
 'Scpiz6a_2623': 1635186988.0,
 'Scpiz6a_540': 1634379467.0,
 'Scpiz6a_547': 1639018199.0,
 'Scpiz6a_546': 1637069838.0,
 'Scpiz6a_545': 1636178814.0,
 'Scpiz6a_544': 1638935820.0,
 'Scpiz6a_2629': 1639111388.0,
 'Scpiz6a_2628': 1634859516.0,
 'Scpiz6a_549': 1638997431.0,
 'Scpiz6a_548': 1633364850.0,
 'Scpiz6a_2544': 1638721741.0,
 'Scpiz6a_2545': 1638404715.0,
 'Scpiz6a_2399': 1634528712.0,
 'Scpiz6a_2398': 1635465783.0,
 'Scpiz6a_2391': 1638459629.0,
 'Scpiz6a_2390': 1634567619.0,
 'Scpiz6a_2246': 1638422668.0,
 'Scpiz6a_2392': 1639419037.0,
 'Scpiz6a_2395': 1636904414.0,
 'Scpiz6a_2394': 1636115376.0,
 'Scpiz6a_2397': 1637956150.0,
 'Scpiz6a_3335': 1636635832.0,
 'Scpiz6a_2615': 1639493785.0,
 'Scpiz6a_469': 1638902623.0,
 'Scpiz6a_3337': 1639123210.0,
 'Scpiz6a_2719': 1636422428.0,
 'Scpiz6a_2718': 1638320174.0,
 'Scpiz6a_2715': 1639159601.0,
 'Scpiz6a_2714': 1636424108.0,
 'Scpiz6a_2717': 1637679652.0,
 'Scpiz6a_2716': 1637015715.0,
 'Scpiz6a_2711': 1636024159.0,
 'Scpiz6a_2710': 1639141435.0,
 'Scpiz6a_2713': 1637417802.0,
 'Scpiz6a_2712': 1637311342.0,
 'Scpiz6a_2575': 1639256931.0,
 'Scpiz6a_2574': 1634491458.0,
 'Scpiz6a_2577': 1637035306.0,
 'Scpiz6a_2576': 1638607795.0,
 'Scpiz6a_2571': 1637512805.0,
 'Scpiz6a_2570': 1638079954.0,
 'Scpiz6a_2573': 1638800994.0,
 'Scpiz6a_2572': 1635266482.0,
 'Scpiz6a_30.1': 510652038.0,
 'Scpiz6a_2579': 1639009459.0,
 'Scpiz6a_2578': 1637925125.0,
 ...}
In [325]:
avgCovEqualHetSitesDF['genome_pos'] = avgCovEqualHetSitesDF.apply(lambda row: row['pos'] + scaffoldScaleDict[row['chrom']], axis=1)
avgCovEqualHetSitesDF['height']=[np.random.choice(a=np.arange(0.0,1.0,0.05)) for n in range(len(avgCovEqualHetSitesDF))]
avgCovEqualHetSitesDF['chrom_size'] = avgCovEqualHetSitesDF['chrom'].apply(lambda x: scaffoldSizeDict[x])
avgCovEqualHetSitesDF.head()
Out[325]:
chrom pos A C G T cov status animal allele_count equal genome_pos height chrom_size
0 Scpiz6a_49 357652 0 10 0 10 20 Transition Atig003 (10, 10) True 357652.0 0.85 85027298
1 Scpiz6a_49 533563 0 10 0 10 20 Transition Atig003 (10, 10) True 533563.0 0.35 85027298
2 Scpiz6a_49 533564 10 10 0 0 20 Transversion Atig003 (10, 10) True 533564.0 0.45 85027298
3 Scpiz6a_49 1076362 10 0 0 10 20 Transversion Atig003 (10, 10) True 1076362.0 0.45 85027298
4 Scpiz6a_49 1258626 10 0 10 0 20 Transition Atig003 (10, 10) True 1258626.0 0.25 85027298
In [326]:
avgCovEqualHetSitesDF[avgCovEqualHetSitesDF['chrom'] =='Scpiz6a_45'].groupby('animal').count()
Out[326]:
chrom pos A C G T cov status allele_count equal genome_pos height chrom_size
animal
A.tig_12512 38 38 38 38 38 38 38 38 38 38 38 38 38
A.tig_12513 29 29 29 29 29 29 29 29 29 29 29 29 29
A.tig_9721 952 952 952 952 952 952 952 952 952 952 952 952 952
A_tigris8450 41 41 41 41 41 41 41 41 41 41 41 41 41
Atig001 629 629 629 629 629 629 629 629 629 629 629 629 629
Atig003 610 610 610 610 610 610 610 610 610 610 610 610 610
Atig_122 848 848 848 848 848 848 848 848 848 848 848 848 848
Atig_4278 807 807 807 807 807 807 807 807 807 807 807 807 807
Atig_6993 26 26 26 26 26 26 26 26 26 26 26 26 26
Atig_9177 28 28 28 28 28 28 28 28 28 28 28 28 28
In [327]:
sns.set(style='white')
fig,axarr = plt.subplots(10, 1, 
    sharex=True, 
    figsize=(10.5, 4), 
    dpi=200, 
    gridspec_kw = {'wspace':0.0, 'hspace':0.1}
)
for i, animal in enumerate(avgCovEqualHetSitesDF.animal.unique()):
    print(len(data))
    data = avgCovEqualHetSitesDF[(avgCovEqualHetSitesDF.animal == animal) & (avgCovEqualHetSitesDF.chrom_size > 1e6)]
    ax = data.plot('genome_pos', 'height', style ='.', marker='.', markersize=2, color='black', legend=False, ax=axarr[i])
    ax.set_ylim(-0.5,1.5)
    ax.set_yticks([])
    ax.vlines(x=scaffoldSizes[scaffoldSizes.scaffold_size > 1e6].genome_scale, ymin=-0.5, ymax=1.5,lw=0.5,linestyles='solid',color='red')
    
plt.show()
598
30440
32832
46773
47695
52945
878
598
716
806

Pysam 10kb Non-Overlapping Windows

The following python script will apply a non-overlapping window across all sequences represented in a profiles file and count the number of even split heterozygous sites in each window.

In [ ]:
# %load ../bin/scan_profile_no_lim.py
#!/usr/bin/env python
#Author: Duncan Tormey
#Email: dut@stowers.org or duncantormey@gmail.com
##################################################
# This script takes the  tsv output from         #
# pysam_profiler.py and applies a 10Kb sliding   #
# window accross each scaffod, counting sites as #
# heterozygous if the pass the conditions of the #
# function is_het. This script is currently hard #
# coded for my data.                             #
##################################################

from __future__ import print_function
from __future__ import division
import pandas as pd
import multiprocessing as mp
import numpy as np

def apply_df(df, func, *args):
    return df.apply(lambda x: func(x, *args), axis=1)


def apply_by_multiprocessing(df, func, workers, *args):
    pool = mp.Pool(processes=workers)
    result = [pool.apply_async(apply_df, args = (d, func) + args) for d in np.array_split(df, workers)]
    output = [p.get() for p in result]
    pool.close()
    return pd.concat(output)



def check_equal_het(row):
    counts = [row['A'],row['T'],row['C'],row['G']]
    non_zero_counts = [n for n in counts if n > 0]
    if len(non_zero_counts)==2 and len(set(non_zero_counts))==1:
        return True
    else:
        return False


def is_het(row, avg_cov):
    if row['cov'] <= 8:
        return False
    elif check_equal_het(row):
        return True
    else:
        return False


def load_het_profiles(path, cpus, avg_cov):
    het_profs = pd.read_csv(path,sep='\t',header=0)
    #het_profs['is_het'] = het_profs.apply(lambda row: is_het(row, avg_cov))
    het_profs['is_het'] = apply_by_multiprocessing(het_profs, is_het, cpus, avg_cov) #this broke with all sites
    het_profs[het_profs['is_het']==True].to_csv(path + '.is_het8.nolim',sep = '\t', index=False)
    
    return het_profs


def return_window_df(het_profs, scaffold_sizes, window=10000):
    windows = []
    append = windows.append
    for scaffold in het_profs.chrom.unique():
        scaffold_data = het_profs[het_profs.chrom == scaffold]
        scaffold_size = scaffold_sizes[scaffold_sizes.scaffold == scaffold]['scaffold_size'].values[0]
        for i in xrange(0,scaffold_size, window):
            if i+window <= scaffold_size:
                num_het = scaffold_data[(scaffold_data['pos']>=i)&(scaffold_data['pos']<i+window)]['is_het'].sum()
                size = window
            else:
                num_het = scaffold_data[scaffold_data['pos']>i]['is_het'].sum()
                size = len(scaffold_data[scaffold_data['pos']>i])

            append((scaffold, i, num_het, size))

    window_df = pd.DataFrame(windows, columns = ['chrom','window_start', 'het_sites','window_size'])
    
    return window_df
    
def write_window_df(het_prof_path, avg_cov, scaffold_sizes_path, cpus, window=10000):
    scaffold_sizes = pd.read_csv(scaffold_sizes_path,sep='\t', names = ['scaffold','scaffold_size'])
    het_profs = load_het_profiles(het_prof_path, cpus, avg_cov)
    window_df = return_window_df(het_profs, scaffold_sizes, window)
    window_df.to_csv(het_prof_path+'.%skbwindows8.nolim.tsv' % str(window/1000), sep='\t',index=False)
    return None

if __name__ == '__main__':
    scaffold_sizes_path = '/home/dut/projects/tigris/genome_annotation/fasta/scaffold_sizes.clean.tsv'

    paths =['/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/pysam/Atig001.merged.dedup.realigned.prof.not_hom',
            '/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/pysam/Atig003.merged.dedup.realigned.prof.not_hom',
            '/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/pysam/Atig_122.merged.dedup.realigned.prof.not_hom',
            '/n/projects/dut/a_marmorata/parthenogen_heterozygosity/data/pysam/A_tigris8450.merged.dedup.realigned.prof.not_hom']

    avg_covs = [16, 18,19,18]
    for path, avg_cov in zip(paths, avg_covs):
        print(path)
        write_window_df(path, avg_cov, scaffold_sizes_path, 8, 10000)

    
In [150]:
equalWindowPaths = {
    'Atig001': '../data/pysam/Atig001.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv',
    'A_tigris8450': '../data/pysam/A_tigris8450.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv',
    'Atig003': '../data/pysam/Atig003.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv',
    'Atig_122': '../data/pysam/Atig_122.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv',
    'A.tig_12512': '../data/pysam/A.tig_12512.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv',
    'A.tig_12513': '../data/pysam/A.tig_12513.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv',
    'A.tig_9721': '../data/pysam/A.tig_9721.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv',
    'Atig_4278': '../data/pysam/Atig_4278.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv',
    'Atig_6993': '../data/pysam/Atig_6993.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv',
    'Atig_9177': '../data/pysam/Atig_9177.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv',
}
In [151]:
def ret_equal_het_windows(window_paths):
    for i, sample_name in enumerate(window_paths):        
        path = window_paths[sample_name]
        window_df = pd.read_csv(path, sep='\t')
        window_df['animal'] = sample_name
        if i == 0:
            df_equal_het_windows = window_df
        else:
            df_equal_het_windows = pd.concat([df_equal_het_windows, window_df])

        print(len(df_equal_het_windows), i, path, sample_name)
        
    return df_equal_het_windows
In [152]:
equalHetWindowsDF = ret_equal_het_windows(equalWindowPaths)
163465 0 ../data/pysam/A.tig_9721.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv A.tig_9721
326930 1 ../data/pysam/A_tigris8450.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv A_tigris8450
490395 2 ../data/pysam/A.tig_12512.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv A.tig_12512
653860 3 ../data/pysam/A.tig_12513.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv A.tig_12513
817325 4 ../data/pysam/Atig_4278.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv Atig_4278
980790 5 ../data/pysam/Atig003.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv Atig003
1144255 6 ../data/pysam/Atig001.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv Atig001
1307720 7 ../data/pysam/Atig_6993.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv Atig_6993
1471185 8 ../data/pysam/Atig_9177.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv Atig_9177
1634650 9 ../data/pysam/Atig_122.merged.dedup.realigned.prof.not_hom.10.0kbwindows8.nolim.tsv Atig_122
In [153]:
equalHetWindowsDF['chrom_size'] = equalHetWindowsDF.chrom.apply(lambda x: scaffoldSizeDict[x])
In [154]:
equalHetWindowsDF.sort_values(['animal','chrom_size', 'window_start'], ascending=[True, False, True], inplace=True)
equalHetWindowsDF.head()
Out[154]:
chrom window_start het_sites window_size animal chrom_size
0 Scpiz6a_49 0 1 10000 A.tig_12512 85027298
1 Scpiz6a_49 10000 0 10000 A.tig_12512 85027298
2 Scpiz6a_49 20000 0 10000 A.tig_12512 85027298
3 Scpiz6a_49 30000 0 10000 A.tig_12512 85027298
4 Scpiz6a_49 40000 0 10000 A.tig_12512 85027298
In [155]:
fig = plt.figure(1, figsize=(3.2, 1.93),dpi=200)
ax = fig.add_subplot(111)

for animal in animal_ids:
    if animal in equalHetWindowsDF.animal.unique():
        rolling_mean_data = equalHetWindowsDF[equalHetWindowsDF.animal==animal].rolling(100).mean().het_sites.dropna()
        sns.distplot(rolling_mean_data,
                     kde=False,
                     hist_kws={"alpha":1, 
                               "color":color_ids[animal], 
                               "linewidth":2, 
                               "histtype": "step"},
                     label=id_to_name[animal],
                     ax=ax)


#ax5.set_title('1Mb Heterozygosity Distribution', fontsize=minorFontSize)
ax.set_ylabel('Number of Windows', fontsize=minorFontSize)
ax.set_xlabel('Mean of Het Sites per 10kb', fontsize=minorFontSize)
ax.legend(loc=5, bbox_to_anchor=(1.32, 0.5), prop={'size':8})
ax.ticklabel_format(style='sci', scilimits=(-3,4), axis='y',useOffset=True)
fig.savefig('../fig2/Figure3C.pdf')
In [156]:
fig = plt.figure(1, figsize=(6.4, 1.93), dpi=300)
ax = fig.add_subplot(111)
step=500
for animal in animal_ids:
    five_mb_rolling = equalHetWindowsDF[(equalHetWindowsDF['animal']==animal) & (equalHetWindowsDF['chrom_size']>5e5)].reset_index().het_sites.rolling(window=step)
    
    five_mb_rolling.mean().plot(legend=False, 
                                      rot = 90, 
                                      #style='.', 
                                      linewidth=0.8,
                                      ax=ax, 
                                      label=id_to_name[animal], 
                                      color=color_ids[animal], 
                                      #markersize=1.0,
                                      fontsize=minorFontSize,
                                     rasterized=True)
    
    ax.fill_between(five_mb_rolling.mean().index, 
                     five_mb_rolling.mean()-2*five_mb_rolling.std()/np.sqrt(step), 
                     five_mb_rolling.mean()+2*five_mb_rolling.std()/np.sqrt(step), 
                     color=color_ids[animal], 
                     alpha=0.3,
                    rasterized=True)

megabase_labels = [str((x*10000/1000000000)) for x in ax.get_xticks()]
ax.set_title('Genome Wide Heterozygosity 5Mb Sliding Window', fontsize=minorFontSize)
ax.set_xticklabels(megabase_labels, rotation=90, fontsize=minorFontSize) 
ax.set_xlabel('Window Start Position (Gb)', fontsize=minorFontSize)
ax.set_ylabel('Avg. Het. Sites / 10kb', fontsize=minorFontSize)
ax.set_ylim(-0.1,4)
ax.legend(loc=5, bbox_to_anchor=(1.20, 0.5), markerscale=5, fontsize=minorFontSize)

#fig.savefig('../fig2/Figure4A.pdf',bbox_inches='tight',dpi=300)
#fig.savefig('../fig/Figure4A.png',bbox_inches='tight',dpi=300)
Out[156]:
<matplotlib.legend.Legend at 0x7f62d6ae72d0>
In [157]:
step=500
for i, family in enumerate([family_1, family_2, family_3]):
    fig = plt.figure(1, figsize=(6.4, 1.0), dpi=300)
    ax = fig.add_subplot(111)
    for animal in family:
        five_mb_rolling = equalHetWindowsDF[(equalHetWindowsDF['animal']==animal) & (equalHetWindowsDF['chrom_size']>5e5)].reset_index().het_sites.rolling(window=step)

        five_mb_rolling.mean().plot(legend=False, 
                                          rot = 90, 
                                          #style='.', 
                                          linewidth=0.8,
                                          ax=ax, 
                                          label=id_to_name[animal], 
                                          color=family_colors[animal], 
                                          #markersize=1.0,
                                          alpha=0.7,
                                          fontsize=minorFontSize,
                                         rasterized=True)

        ax.fill_between(five_mb_rolling.mean().index, 
                         five_mb_rolling.mean()-2*five_mb_rolling.std()/np.sqrt(step), 
                         five_mb_rolling.mean()+2*five_mb_rolling.std()/np.sqrt(step), 
                         color=family_colors[animal], 
                         alpha=0.3,
                        rasterized=True)

    megabase_labels = [str((x*10000/1000000000)) for x in ax.get_xticks()]
    #ax.set_title('Genome Wide Heterozygosity 5Mb Sliding Window', fontsize=minorFontSize)
    ax.set_xticklabels(megabase_labels, rotation=90, fontsize=minorFontSize) 
    ax.set_xlabel('Window Start Position (Gb)', fontsize=minorFontSize)
    ax.set_ylabel('Avg. Het. Sites / 10kb', fontsize=minorFontSize-3)
    ax.set_ylim(-0.1,4)
    ax.legend(loc=5, bbox_to_anchor=(1.20, 0.5), markerscale=5, fontsize=minorFontSize)
#     fig.savefig('../fig2/Figure4%s.pdf' % letters[i], format="pdf",bbox_inches='tight')
    plt.show()
In [158]:
step=500
fig,axarr = plt.subplots(3, 1, 
    sharex=True, 
    figsize=(6.2, 3), 
    dpi=200, 
    gridspec_kw = {'wspace':0.0, 'hspace':0.1}
)
for i, family in enumerate([family_1, family_2, family_3]):
    ax=axarr[i]
    for animal in family:
        five_mb_rolling = equalHetWindowsDF[(equalHetWindowsDF['animal']==animal) & (equalHetWindowsDF['chrom_size']>5e5)].reset_index().het_sites.rolling(window=step)

        five_mb_rolling.mean().plot(legend=False, 
                                          rot = 90, 
                                          #style='.', 
                                          linewidth=0.8,
                                          ax=ax, 
                                          label=id_to_name[animal], 
                                          color=family_colors[animal], 
                                          #markersize=1.0,
                                          alpha=0.7,
                                          fontsize=minorFontSize,
                                         rasterized=True)

        ax.fill_between(five_mb_rolling.mean().index, 
                         five_mb_rolling.mean()-2*five_mb_rolling.std()/np.sqrt(step), 
                         five_mb_rolling.mean()+2*five_mb_rolling.std()/np.sqrt(step), 
                         color=family_colors[animal], 
                         alpha=0.3,
                        rasterized=True)
    
    ax.set_yticks(np.arange(0,4.0,1.0))
    megabase_labels = [str((x*10000/1000000000)) for x in ax.get_xticks()]
    #ax.set_title('Genome Wide Heterozygosity 5Mb Sliding Window', fontsize=minorFontSize)
    ax.set_xticklabels(megabase_labels, rotation=90, fontsize=minorFontSize)
    
    ax.set_xlabel('Window Start Position (Gb)', fontsize=minorFontSize)
    #ax.set_ylabel('Avg. Het. Sites / 10kb', fontsize=minorFontSize-3)
    ax.set_ylim(-0.1,3.5)
    ax.legend(loc=5, bbox_to_anchor=(1.20, 0.5), markerscale=5, fontsize=minorFontSize)

fig.text(0.04, 0.5, 'Avg. Het. Sites / 10kb', ha='center', va='center', fontsize=minorFontSize, rotation='vertical')
fig.savefig('../fig2/Figure4A.pdf', format="pdf",bbox_inches='tight')
In [216]:
step=100
fig,axarr = plt.subplots(3, 1, 
    sharex=True, 
    figsize=(6.2, 3), 
    dpi=200, 
    gridspec_kw = {'wspace':0.0, 'hspace':0.1}
)
for i, family in enumerate([family_1, family_2, family_3]):
    ax=axarr[i]
    for animal in family:
        five_mb_rolling = equalHetWindowsDF[(equalHetWindowsDF['animal']==animal) & (equalHetWindowsDF['chrom_size']>5e5)].reset_index().het_sites.rolling(window=step)

        five_mb_rolling.mean().plot(legend=False, 
                                          rot = 90, 
                                          #style='.', 
                                          linewidth=0.8,
                                          ax=ax, 
                                          label=id_to_name[animal], 
                                          color=family_colors[animal], 
                                          #markersize=1.0,
                                          alpha=0.7,
                                          fontsize=minorFontSize,
                                         rasterized=True)

        ax.fill_between(five_mb_rolling.mean().index, 
                         five_mb_rolling.mean()-2*five_mb_rolling.std()/np.sqrt(step), 
                         five_mb_rolling.mean()+2*five_mb_rolling.std()/np.sqrt(step), 
                         color=family_colors[animal], 
                         alpha=0.3,
                        rasterized=True)
    
    ax.set_yticks(np.arange(0,4.0,1.0))
    megabase_labels = [str((x*10000/1000000000)) for x in ax.get_xticks()]
    #ax.set_title('Genome Wide Heterozygosity 5Mb Sliding Window', fontsize=minorFontSize)
    ax.set_xticklabels(megabase_labels, rotation=90, fontsize=minorFontSize)
    
    ax.set_xlabel('Window Start Position (Gb)', fontsize=minorFontSize)
    #ax.set_ylabel('Avg. Het. Sites / 10kb', fontsize=minorFontSize-3)
    ax.set_ylim(-0.1,4.5)
    ax.legend(loc=5, bbox_to_anchor=(1.20, 0.5), markerscale=5, fontsize=minorFontSize)

fig.text(0.04, 0.5, 'Avg. Het. Sites / 10kb', ha='center', va='center', fontsize=minorFontSize, rotation='vertical')
#fig.savefig('../fig2/Figure4A.pdf', format="pdf",bbox_inches='tight')
Out[216]:
<matplotlib.text.Text at 0x7f6350242fd0>
In [215]:
step=50
fig,axarr = plt.subplots(3, 1, 
    sharex=True, 
    figsize=(6.2, 3), 
    dpi=200, 
    gridspec_kw = {'wspace':0.0, 'hspace':0.1}
)
for i, family in enumerate([family_1, family_2, family_3]):
    ax=axarr[i]
    for animal in family:
        five_mb_rolling = equalHetWindowsDF[(equalHetWindowsDF['animal']==animal) & (equalHetWindowsDF['chrom_size']>5e5)].reset_index().het_sites.rolling(window=step)

        five_mb_rolling.mean().plot(legend=False, 
                                          rot = 90, 
                                          #style='.', 
                                          linewidth=0.8,
                                          ax=ax, 
                                          label=id_to_name[animal], 
                                          color=family_colors[animal], 
                                          #markersize=1.0,
                                          alpha=0.7,
                                          fontsize=minorFontSize,
                                         rasterized=True)

        ax.fill_between(five_mb_rolling.mean().index, 
                         five_mb_rolling.mean()-2*five_mb_rolling.std()/np.sqrt(step), 
                         five_mb_rolling.mean()+2*five_mb_rolling.std()/np.sqrt(step), 
                         color=family_colors[animal], 
                         alpha=0.3,
                        rasterized=True)
    
    ax.set_yticks(np.arange(0,4.0,1.0))
    megabase_labels = [str((x*10000/1000000000)) for x in ax.get_xticks()]
    #ax.set_title('Genome Wide Heterozygosity 5Mb Sliding Window', fontsize=minorFontSize)
    ax.set_xticklabels(megabase_labels, rotation=90, fontsize=minorFontSize)
    
    ax.set_xlabel('Window Start Position (Gb)', fontsize=minorFontSize)
    #ax.set_ylabel('Avg. Het. Sites / 10kb', fontsize=minorFontSize-3)
    ax.set_ylim(-0.1,4.5)
    ax.legend(loc=5, bbox_to_anchor=(1.20, 0.5), markerscale=5, fontsize=minorFontSize)

fig.text(0.04, 0.5, 'Avg. Het. Sites / 10kb', ha='center', va='center', fontsize=minorFontSize, rotation='vertical')
#fig.savefig('../fig2/Figure4A.pdf', format="pdf",bbox_inches='tight')
In [218]:
step=10
fig,axarr = plt.subplots(3, 1, 
    sharex=True, 
    figsize=(6.2, 3), 
    dpi=200, 
    gridspec_kw = {'wspace':0.0, 'hspace':0.1}
)
for i, family in enumerate([family_1, family_2, family_3]):
    ax=axarr[i]
    for animal in family:
        five_mb_rolling = equalHetWindowsDF[(equalHetWindowsDF['animal']==animal) & (equalHetWindowsDF['chrom_size']>5e5)].reset_index().het_sites.rolling(window=step)

        five_mb_rolling.mean().plot(legend=False, 
                                          rot = 90, 
                                          #style='.', 
                                          linewidth=0.8,
                                          ax=ax, 
                                          label=id_to_name[animal], 
                                          color=family_colors[animal], 
                                          #markersize=1.0,
                                          alpha=0.7,
                                          fontsize=minorFontSize,
                                         rasterized=True)

        ax.fill_between(five_mb_rolling.mean().index, 
                         five_mb_rolling.mean()-2*five_mb_rolling.std()/np.sqrt(step), 
                         five_mb_rolling.mean()+2*five_mb_rolling.std()/np.sqrt(step), 
                         color=family_colors[animal], 
                         alpha=0.3,
                        rasterized=True)
    
    ax.set_yticks(np.arange(0,6.0,1.0))
    megabase_labels = [str((x*10000/1000000000)) for x in ax.get_xticks()]
    #ax.set_title('Genome Wide Heterozygosity 5Mb Sliding Window', fontsize=minorFontSize)
    ax.set_xticklabels(megabase_labels, rotation=90, fontsize=minorFontSize)
    
    ax.set_xlabel('Window Start Position (Gb)', fontsize=minorFontSize)
    #ax.set_ylabel('Avg. Het. Sites / 10kb', fontsize=minorFontSize-3)
    ax.set_ylim(-0.1,6.5)
    ax.legend(loc=5, bbox_to_anchor=(1.20, 0.5), markerscale=5, fontsize=minorFontSize)

fig.text(0.04, 0.5, 'Avg. Het. Sites / 10kb', ha='center', va='center', fontsize=minorFontSize, rotation='vertical')
#fig.savefig('../fig2/Figure4A.pdf', format="pdf",bbox_inches='tight')
Out[218]:
<matplotlib.text.Text at 0x7f63a8f0aad0>
In [160]:
step = 500
mother = equalHetWindowsDF[(equalHetWindowsDF.animal == 'Atig_122') & (equalHetWindowsDF['chrom_size']>5e5)].reset_index().het_sites.rolling(window=step).mean()
fp_animal = equalHetWindowsDF[(equalHetWindowsDF.animal == 'A_tigris8450') & (equalHetWindowsDF['chrom_size']>5e5)].reset_index().het_sites.rolling(window=step).mean()
ratio = mother/fp_animal

ax = np.log2(ratio).plot(legend=False, 
                rot = 90, 
                style='-',
                label=animal,
                color='black',
                markersize=0.5,
               figsize=(6.4,2),
               fontsize=minorFontSize)

ax.set_title('Ratio of 122:8450 Average Het sites to per 10kb %sMb Sliding Window' % str(step/100), fontsize=minorFontSize)
megabase_labels = [str((x*10000/1000000000)) for x in ax.get_xticks()]
ax.set_xticklabels(megabase_labels,rotation=90, fontsize=minorFontSize) 
ax.set_xlabel('Window Start Position (Gb)', fontsize=minorFontSize)
ax.set_ylabel('log2 ratio', fontsize=minorFontSize)
ax.set_ylim(0,9)
mean_r = np.log2(ratio).mean()
se_r = np.log2(ratio).std()#/np.sqrt(len(ratio.dropna()))
ax.text(1.0e4,0.25, 'average log2(ratio) = %s\nstandard deviation log2(ratio) = %s'%(str(mean_r), str(se_r)), fontsize=minorFontSize)
fig = ax.get_figure()
fig.savefig('../fig2/SupplementalFigure7.pdf', bbox_inches='tight', pad_inches=0.25)
fig.savefig('../fig/SupplementalFigure7.png', bbox_inches='tight', pad_inches=0.25, dpi=300)

V2R Copy Number analysis MAKER Annotations

This first table I am reading in shows the copy number variation for every gene annotated in the MAKER annotations.

In [161]:
gffCopyNumMarmDF = pd.read_csv('../../dovetail_vomeronasal/data/annotCopyNumsDF.tsv', sep='\t')
gffCopyNumMarmDF = gffCopyNumMarmDF[gffCopyNumMarmDF.type=="gene"]
gffCopyNumMarmDF.cnv.fillna('CN1',inplace=True)
gffCopyNumMarmDF['copies'] = gffCopyNumMarmDF['cnv'].apply(lambda x: int(x[2]))
gffCopyNumMarmDF.head()
Out[161]:
seqnames start end width strand source type score phase ID ... Note Dbxref Parent X_AED X_QI X_eAED Ontology_term score.1 cnv copies
0 Scpiz6a_1 69274 86971.0 17698 + maker gene NaN NaN gene1 ... Similar to LYPD6: Ly6/PLAUR domain-containing protein 6 (Homo sapiens) c("Gene3D:G3DSA:2.10.60.10", "SUPERFAMILY:SSF57302") character(0) NaN NaN NaN character(0) NaN CN1 1
11 Scpiz6a_1 105656 124326.0 18671 - maker gene NaN NaN gene2 ... Similar to MMADHC: Methylmalonic aciduria and homocystinuria type D homolog, mitochondrial (Gallus gallus) c("InterPro:IPR019362", "Pfam:PF10229") character(0) NaN NaN NaN GO:0009235 NaN CN1 1
40 Scpiz6a_1 458899 468953.0 10055 - maker gene NaN NaN gene3 ... Similar to Rnd3: Rho-related GTP-binding protein RhoE (Rattus norvegicus) c("Gene3D:G3DSA:3.40.50.300", "InterPro:IPR001806", "InterPro:IPR003578", "InterPro:IPR003579", "InterPro:IPR027417", "Pfam:PF00071", "ProSiteProfiles:PS51420", "SMART:SM00174", "SMART:SM00175", "SUPERFAMILY:SSF52540") character(0) NaN NaN NaN c("GO:0005525", "GO:0005622", "GO:0007264", "GO:0015031") NaN CN1 1
50 Scpiz6a_1 609021 609749.0 729 - maker gene NaN NaN gene4 ... Protein of unknown function c("Gene3D:G3DSA:1.10.443.10", "InterPro:IPR011010", "InterPro:IPR013762", "SUPERFAMILY:SSF56349") character(0) NaN NaN NaN c("GO:0003677", "GO:0006310", "GO:0015074") NaN CN1 1
54 Scpiz6a_1 677616 678006.0 391 - maker gene NaN NaN gene5 ... Similar to rgr-1: Mediator of RNA polymerase II transcription subunit 14 (Caenorhabditis briggsae) character(0) character(0) NaN NaN NaN character(0) 1246.0 CN1 1

5 rows × 22 columns

Next I add a column to contain the gene symbol.

In [162]:
gffCopyNumMarmDF['gene_symbol'] = gffCopyNumMarmDF['Note'].apply(lambda x: x.replace('Similar to ', '').split(':')[0])
gffCopyNumMarmDF['gene_symbol'].head()
Out[162]:
0     LYPD6                      
11    MMADHC                     
40    Rnd3                       
50    Protein of unknown function
54    rgr-1                      
Name: gene_symbol, dtype: object

Here I number of times each gene symbol occurs genome wide. Only gene symbols that occur more than 10 times are shown.

In [163]:
fig = plt.figure(1, figsize=(3.2, 3.2))
ax = fig.add_subplot(111)
value_counts = pd.DataFrame(gffCopyNumMarmDF[gffCopyNumMarmDF['gene_symbol'] != "Protein of unknown function"]['gene_symbol'].value_counts())
data = value_counts[value_counts.gene_symbol > 10]
data.plot(kind='bar', ax=ax, legend=False)
ax.set_ylabel('Genome Wide Counts')
ax.set_xlabel('Homologous Gene Symbol')
fig.savefig('../fig/vmnr_maker_all_counts.pdf', bbox_inches='tight')

Here are the numbers from that plot. Vmn2r26 occurs 309 times genome wide.

In [164]:
data
Out[164]:
gene_symbol
Vmn2r26 309
ZSCAN31 55
ZBED9 35
ZNF420 28
Znf24 28
TRIM27 21
cysS 18
rgr-1 17
Trim27 14
OR5V1 13
ZSCAN2 13
infB 13
At1g06800 12
ZFP2 11

Here I generate the same plot as above, but only for scaffold 45 (which is the scaffold that shows increased apparent heterozygosity). Only symbols 4 or more occurences are shown.

In [165]:
fig = plt.figure(1, figsize=(3.2, 3.2))
ax = fig.add_subplot(111)
value_counts = pd.DataFrame(gffCopyNumMarmDF[(gffCopyNumMarmDF['gene_symbol'] != "Protein of unknown function") & (gffCopyNumMarmDF.seqnames =='Scpiz6a_45')]['gene_symbol'].value_counts())
data = value_counts[value_counts.gene_symbol > 4]
data.plot(kind='bar', ax=ax, legend=False)
ax.set_ylabel('Scaffold 45 Counts')
ax.set_xlabel('Homologous Gene Symbol')
fig.savefig('../fig/vmnr_maker_45_counts.pdf', bbox_inches='tight')

Here is the data for this plot. Vmn2r26 occurs 177 time on scaffold 45.

In [166]:
data
Out[166]:
gene_symbol
Vmn2r26 167
OR14I1 10
OR11G2 6
PRSS27 5

Here I save the subsetted table for all Vmn2r26 that are on scaffold 45.

In [167]:
vmnrGffCopyDF = gffCopyNumMarmDF[(gffCopyNumMarmDF.Note.str.contains('Vmn2r26',case=False)) & (gffCopyNumMarmDF.seqnames =='Scpiz6a_45')].reset_index().copy()
vmnrGffCopyDF.head()
Out[167]:
index seqnames start end width strand source type score phase ... Dbxref Parent X_AED X_QI X_eAED Ontology_term score.1 cnv copies gene_symbol
0 465598 Scpiz6a_45 5125710 5235885.0 110176 + maker gene NaN NaN ... c("Gene3D:G3DSA:3.40.50.2300", "InterPro:IPR000337", "InterPro:IPR001828", "InterPro:IPR004073", "InterPro:IPR011500", "InterPro:IPR017978", "InterPro:IPR017979", "InterPro:IPR028082", "PRINTS:PR00248", "PRINTS:PR01535", "Pfam:PF00003", "Pfam:PF01094", "Pfam:PF07562", "ProSitePatterns:PS00981", "ProSiteProfiles:PS50259", "SUPERFAMILY:SSF53822", "SUPERFAMILY:SSF81665") character(0) NaN NaN NaN c("GO:0004930", "GO:0007186", "GO:0016021") NaN CN1 1 Vmn2r26
1 465885 Scpiz6a_45 5243577 5320649.0 77073 - maker gene NaN NaN ... c("Gene3D:G3DSA:3.40.50.2300", "InterPro:IPR000337", "InterPro:IPR001828", "InterPro:IPR004073", "InterPro:IPR011500", "InterPro:IPR017978", "InterPro:IPR017979", "InterPro:IPR028082", "PRINTS:PR00248", "PRINTS:PR01535", "Pfam:PF00003", "Pfam:PF01094", "Pfam:PF07562", "ProSitePatterns:PS00981", "ProSiteProfiles:PS50259", "SUPERFAMILY:SSF53822", "SUPERFAMILY:SSF81665") character(0) NaN NaN NaN c("GO:0004930", "GO:0007186", "GO:0016021") NaN CN1 1 Vmn2r26
2 466138 Scpiz6a_45 5332681 5367671.0 34991 - maker gene NaN NaN ... c("Gene3D:G3DSA:3.40.50.2300", "InterPro:IPR000337", "InterPro:IPR001828", "InterPro:IPR004073", "InterPro:IPR011500", "InterPro:IPR017978", "InterPro:IPR017979", "InterPro:IPR028082", "PRINTS:PR00248", "PRINTS:PR01535", "Pfam:PF00003", "Pfam:PF01094", "Pfam:PF07562", "ProSitePatterns:PS00981", "ProSiteProfiles:PS50259", "SUPERFAMILY:SSF53822") character(0) NaN NaN NaN c("GO:0004930", "GO:0007186", "GO:0016021") NaN CN1 1 Vmn2r26
3 466383 Scpiz6a_45 5374537 5392755.0 18219 + maker gene NaN NaN ... c("Gene3D:G3DSA:3.40.50.2300", "InterPro:IPR000337", "InterPro:IPR001828", "InterPro:IPR004073", "InterPro:IPR011500", "InterPro:IPR017978", "InterPro:IPR017979", "InterPro:IPR028082", "PRINTS:PR00248", "PRINTS:PR01535", "Pfam:PF00003", "Pfam:PF01094", "Pfam:PF07562", "ProSitePatterns:PS00981", "ProSiteProfiles:PS50259", "SUPERFAMILY:SSF53822") character(0) NaN NaN NaN c("GO:0004930", "GO:0007186", "GO:0016021") NaN CN1 1 Vmn2r26
4 466470 Scpiz6a_45 5509516 5545895.0 36380 + maker gene NaN NaN ... c("Gene3D:G3DSA:3.40.50.2300", "InterPro:IPR000337", "InterPro:IPR001828", "InterPro:IPR004073", "InterPro:IPR011500", "InterPro:IPR017978", "InterPro:IPR017979", "InterPro:IPR028082", "PRINTS:PR00248", "PRINTS:PR01535", "Pfam:PF00003", "Pfam:PF01094", "Pfam:PF07562", "ProSitePatterns:PS00981", "ProSiteProfiles:PS50259", "SUPERFAMILY:SSF53822") character(0) NaN NaN NaN c("GO:0004930", "GO:0007186", "GO:0016021") NaN CN1 1 Vmn2r26

5 rows × 24 columns

In this next section I compare the compy number estimations in 8450 to the number of annotations present in the genome assembly

There are 323 copies of Vmn2r26 in 8450, according to the MAKER gene annotations and the copy number variation analysis.

In [168]:
gffCopyNumMarmDF[(gffCopyNumMarmDF.type == 'gene') & (gffCopyNumMarmDF.Note.str.contains('Vmn2r26',case=False))].copies.sum()
Out[168]:
323

Of the 323 copies, 177 reside on scaffold 45.

In [169]:
vmnrGffCopyDF.copies.sum()
Out[169]:
177
In [170]:
vmnrGffCopyDF.cnv.value_counts().plot(kind='bar')
Out[170]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f64f1eadf50>
In [171]:
vmnrDF = gffMarmDF[(gffMarmDF.feature == 'gene') & (gffMarmDF.gene_symbol=='Vmn2r26') & (gffMarmDF.seqid =='Scpiz6a_45')].copy()
vmnrDF['width'] = vmnrDF['end'] - vmnrDF['start']
vmnrDF['broken_bars'] = vmnrDF.apply(lambda row: (row['start']/10000, row['width']/10000), axis=1)
vmnrDF.head()
Out[171]:
seqid source feature start end score strand phase attributes gene_attributes gene_symbol gene_id name count_column width broken_bars
482606 Scpiz6a_45 MAKER2 gene 5125710.0 5235885.0 . + . ID=gene17679;Name=ATIG_00000233;Alias=maker-Scpiz6a_45-exonerate_protein2genome-gene-52.0;Note=Similar to Vmn2r26: Vomeronasal type-2 receptor 26 (Mus musculus);Dbxref=Gene3D:G3DSA:3.40.50.2300,InterPro:IPR000337,InterPro:IPR001828,InterPro:IPR004073,InterPro:IPR011500,InterPro:IPR017978,InterPro:IPR017979,InterPro:IPR028082,PRINTS:PR00248,PRINTS:PR01535,Pfam:PF00003,Pfam:PF01094,Pfam:PF07562,ProSitePatterns:PS00981,ProSiteProfiles:PS50259,SUPERFAMILY:SSF53822,SUPERFAMILY:SSF81665;Ontology_term=GO:0004930,GO:0007186,GO:0016021 ID=gene17679;Name=ATIG_00000233;Alias=maker-Scpiz6a_45-exonerate_protein2genome-gene-52.0;Note=Similar to Vmn2r26: Vomeronasal type-2 receptor 26 (Mus musculus);Dbxref=Gene3D:G3DSA:3.40.50.2300,InterPro:IPR000337,InterPro:IPR001828,InterPro:IPR004073,InterPro:IPR011500,InterPro:IPR017978,InterPro:IPR017979,InterPro:IPR028082,PRINTS:PR00248,PRINTS:PR01535,Pfam:PF00003,Pfam:PF01094,Pfam:PF07562,ProSitePatterns:PS00981,ProSiteProfiles:PS50259,SUPERFAMILY:SSF53822,SUPERFAMILY:SSF81665;Ontology_term=GO:0004930,GO:0007186,GO:0016021 Vmn2r26 gene17679 ATIG_00000233 (Scpiz6a_45, Vmn2r26) 110175.0 (512.571, 11.0175)
482894 Scpiz6a_45 MAKER2 gene 5243577.0 5320649.0 . - . ID=gene17680;Name=ATIG_00000234;Alias=maker-Scpiz6a_45-exonerate_protein2genome-gene-53.0;Note=Similar to Vmn2r26: Vomeronasal type-2 receptor 26 (Mus musculus);Dbxref=Gene3D:G3DSA:3.40.50.2300,InterPro:IPR000337,InterPro:IPR001828,InterPro:IPR004073,InterPro:IPR011500,InterPro:IPR017978,InterPro:IPR017979,InterPro:IPR028082,PRINTS:PR00248,PRINTS:PR01535,Pfam:PF00003,Pfam:PF01094,Pfam:PF07562,ProSitePatterns:PS00981,ProSiteProfiles:PS50259,SUPERFAMILY:SSF53822,SUPERFAMILY:SSF81665;Ontology_term=GO:0004930,GO:0007186,GO:0016021 ID=gene17680;Name=ATIG_00000234;Alias=maker-Scpiz6a_45-exonerate_protein2genome-gene-53.0;Note=Similar to Vmn2r26: Vomeronasal type-2 receptor 26 (Mus musculus);Dbxref=Gene3D:G3DSA:3.40.50.2300,InterPro:IPR000337,InterPro:IPR001828,InterPro:IPR004073,InterPro:IPR011500,InterPro:IPR017978,InterPro:IPR017979,InterPro:IPR028082,PRINTS:PR00248,PRINTS:PR01535,Pfam:PF00003,Pfam:PF01094,Pfam:PF07562,ProSitePatterns:PS00981,ProSiteProfiles:PS50259,SUPERFAMILY:SSF53822,SUPERFAMILY:SSF81665;Ontology_term=GO:0004930,GO:0007186,GO:0016021 Vmn2r26 gene17680 ATIG_00000234 (Scpiz6a_45, Vmn2r26) 77072.0 (524.3577, 7.7072)
483148 Scpiz6a_45 MAKER2 gene 5332681.0 5367671.0 . - . ID=gene17681;Name=ATIG_00000235;Alias=maker-Scpiz6a_45-exonerate_protein2genome-gene-53.1;Note=Similar to Vmn2r26: Vomeronasal type-2 receptor 26 (Mus musculus);Dbxref=Gene3D:G3DSA:3.40.50.2300,InterPro:IPR000337,InterPro:IPR001828,InterPro:IPR004073,InterPro:IPR011500,InterPro:IPR017978,InterPro:IPR017979,InterPro:IPR028082,PRINTS:PR00248,PRINTS:PR01535,Pfam:PF00003,Pfam:PF01094,Pfam:PF07562,ProSitePatterns:PS00981,ProSiteProfiles:PS50259,SUPERFAMILY:SSF53822;Ontology_term=GO:0004930,GO:0007186,GO:0016021 ID=gene17681;Name=ATIG_00000235;Alias=maker-Scpiz6a_45-exonerate_protein2genome-gene-53.1;Note=Similar to Vmn2r26: Vomeronasal type-2 receptor 26 (Mus musculus);Dbxref=Gene3D:G3DSA:3.40.50.2300,InterPro:IPR000337,InterPro:IPR001828,InterPro:IPR004073,InterPro:IPR011500,InterPro:IPR017978,InterPro:IPR017979,InterPro:IPR028082,PRINTS:PR00248,PRINTS:PR01535,Pfam:PF00003,Pfam:PF01094,Pfam:PF07562,ProSitePatterns:PS00981,ProSiteProfiles:PS50259,SUPERFAMILY:SSF53822;Ontology_term=GO:0004930,GO:0007186,GO:0016021 Vmn2r26 gene17681 ATIG_00000235 (Scpiz6a_45, Vmn2r26) 34990.0 (533.2681, 3.499)
483394 Scpiz6a_45 MAKER2 gene 5374537.0 5392755.0 . + . ID=gene17682;Name=ATIG_00000236;Alias=maker-Scpiz6a_45-exonerate_protein2genome-gene-55.0;Note=Similar to Vmn2r26: Vomeronasal type-2 receptor 26 (Mus musculus);Dbxref=Gene3D:G3DSA:3.40.50.2300,InterPro:IPR000337,InterPro:IPR001828,InterPro:IPR004073,InterPro:IPR011500,InterPro:IPR017978,InterPro:IPR017979,InterPro:IPR028082,PRINTS:PR00248,PRINTS:PR01535,Pfam:PF00003,Pfam:PF01094,Pfam:PF07562,ProSitePatterns:PS00981,ProSiteProfiles:PS50259,SUPERFAMILY:SSF53822;Ontology_term=GO:0004930,GO:0007186,GO:0016021 ID=gene17682;Name=ATIG_00000236;Alias=maker-Scpiz6a_45-exonerate_protein2genome-gene-55.0;Note=Similar to Vmn2r26: Vomeronasal type-2 receptor 26 (Mus musculus);Dbxref=Gene3D:G3DSA:3.40.50.2300,InterPro:IPR000337,InterPro:IPR001828,InterPro:IPR004073,InterPro:IPR011500,InterPro:IPR017978,InterPro:IPR017979,InterPro:IPR028082,PRINTS:PR00248,PRINTS:PR01535,Pfam:PF00003,Pfam:PF01094,Pfam:PF07562,ProSitePatterns:PS00981,ProSiteProfiles:PS50259,SUPERFAMILY:SSF53822;Ontology_term=GO:0004930,GO:0007186,GO:0016021 Vmn2r26 gene17682 ATIG_00000236 (Scpiz6a_45, Vmn2r26) 18218.0 (537.4537, 1.8218)
483482 Scpiz6a_45 MAKER2 gene 5509516.0 5545895.0 . + . ID=gene17683;Name=ATIG_00000237;Alias=maker-Scpiz6a_45-exonerate_protein2genome-gene-55.1;Note=Similar to Vmn2r26: Vomeronasal type-2 receptor 26 (Mus musculus);Dbxref=Gene3D:G3DSA:3.40.50.2300,InterPro:IPR000337,InterPro:IPR001828,InterPro:IPR004073,InterPro:IPR011500,InterPro:IPR017978,InterPro:IPR017979,InterPro:IPR028082,PRINTS:PR00248,PRINTS:PR01535,Pfam:PF00003,Pfam:PF01094,Pfam:PF07562,ProSitePatterns:PS00981,ProSiteProfiles:PS50259,SUPERFAMILY:SSF53822;Ontology_term=GO:0004930,GO:0007186,GO:0016021 ID=gene17683;Name=ATIG_00000237;Alias=maker-Scpiz6a_45-exonerate_protein2genome-gene-55.1;Note=Similar to Vmn2r26: Vomeronasal type-2 receptor 26 (Mus musculus);Dbxref=Gene3D:G3DSA:3.40.50.2300,InterPro:IPR000337,InterPro:IPR001828,InterPro:IPR004073,InterPro:IPR011500,InterPro:IPR017978,InterPro:IPR017979,InterPro:IPR028082,PRINTS:PR00248,PRINTS:PR01535,Pfam:PF00003,Pfam:PF01094,Pfam:PF07562,ProSitePatterns:PS00981,ProSiteProfiles:PS50259,SUPERFAMILY:SSF53822;Ontology_term=GO:0004930,GO:0007186,GO:0016021 Vmn2r26 gene17683 ATIG_00000237 (Scpiz6a_45, Vmn2r26) 36379.0 (550.9516, 3.6379)
In [172]:
fig = plt.figure(1, figsize=(6.4, 1.93), dpi=200)
ax = fig.add_subplot(111)
step=100
scaffold='Scpiz6a_45'
lns = None
for animal in animal_ids:
    scaffold_het_rolling = equalHetWindowsDF[(equalHetWindowsDF['animal']==animal) & (equalHetWindowsDF['chrom']==scaffold)].reset_index().het_sites.rolling(window=step)
    ln1 = ax.plot(scaffold_het_rolling.mean(),
           '.',
            color=color_ids[animal],
            label=id_to_name[animal],
            markersize=2.5,
                 rasterized=True)

    ax.fill_between(scaffold_het_rolling.mean().index, 
                     scaffold_het_rolling.mean()-2*scaffold_het_rolling.std()/np.sqrt(step), 
                     scaffold_het_rolling.mean()+2*scaffold_het_rolling.std()/np.sqrt(step), 
                     color=color_ids[animal], 
                     alpha=0.3,
                   rasterized=True)
    if not lns:
        lns = ln1
    else:
        lns+=ln1
    
ax.set_ylim(-2.7, 5)    
ax.set_yticks(np.arange(0,5.0,1.0))
ax.set_yticklabels(ax.get_yticks(), fontsize=minorFontSize)
ax.set_ylabel('Avg. Het Sites / 10Kb', fontsize=minorFontSize)


megabase_labels = [str((x*10000/1000000)) for x in ax.get_xticks()]
ax.set_xticklabels(megabase_labels,rotation=90, fontsize=minorFontSize) 
ax.set_xlabel("Window Start Position (Mb)", fontsize=minorFontSize)

ax.set_title('%dMb Sliding Window Scaffold: %s' % (step*10000/1000000, scaffold), fontsize=minorFontSize)
ax.set_xlim(scaffold_het_rolling.mean().dropna().index.min(),scaffold_het_rolling.mean().dropna().index.max())

labs = [l.get_label() for l in lns]
ax.legend(lns, labs, loc=5, bbox_to_anchor=(1.20, 0.5),markerscale=5, fontsize=minorFontSize)



chrom = ax.add_collection(BrokenBarHCollection(
        [(scaffold_het_rolling.mean().dropna().index.min()+2, scaffold_het_rolling.mean().dropna().index.max()-105)], 
        (-2.2,1.5),
        facecolors=color_ids['A_tigris8450'],
        alpha=0.25,
        linewidths=[1]
                                      )
    )


ax.add_collection(BrokenBarHCollection(
        vmnrDF[vmnrDF['feature']=='gene'].broken_bars, 
        (-2.2,1.5),
        facecolors=['black' for i in xrange(len(vmnrDF.broken_bars))],
        linewidths=[0.5 for i in xrange(len(vmnrDF.broken_bars))]
                                      )
    )
fig.savefig('../fig2/Figure4B.pdf', bbox_inches='tight', )

V2R Exonerate Copy Number analysis

Instead of using the maker annotations I also tried identifying v2r genes by aligning the mouse ortholog to the genome with exonerate.

In [173]:
vmnrCopyNumbers = pd.read_csv('../../dovetail_vomeronasal/data/vmnrCopyNumsDF.csv')
In [174]:
vmnrCopyNumbers.head()
Out[174]:
Unnamed: 0 seqnames start end width strand source type score phase ... similarity insertions deletions intron_id splice_site alignment_id Query Align frameshifts cnv
0 1 Scpiz6a_49 64908337 64910441 2105 + exonerate:protein2genome:local gene 683.0 NaN ... 63.30 NaN NaN NaN NaN NaN NaN NaN NaN CN1
1 2 Scpiz6a_49 64908337 64908463 127 + exonerate:protein2genome:local cds NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN CN1
2 3 Scpiz6a_49 64908337 64908463 127 + exonerate:protein2genome:local exon NaN NaN ... 69.05 0.0 0.0 NaN NaN NaN NaN NaN NaN CN1
3 4 Scpiz6a_49 64908464 64908465 2 + exonerate:protein2genome:local splice5 NaN NaN ... NaN NaN NaN 1.0 GT NaN NaN NaN NaN CN1
4 5 Scpiz6a_49 64908464 64909578 1115 + exonerate:protein2genome:local intron NaN NaN ... NaN NaN NaN 1.0 NaN NaN NaN NaN NaN CN1

5 rows × 24 columns

In [175]:
vmnrCopyNumbers['copies'] = vmnrCopyNumbers['cnv'].apply(lambda x: int(x[2]))
vmnrCopyNumbers['broken_bars'] = vmnrCopyNumbers.apply(lambda row: (row['start']/10000, row['width']/10000), axis=1)
vmnrCopyNumbers.head()
Out[175]:
Unnamed: 0 seqnames start end width strand source type score phase ... deletions intron_id splice_site alignment_id Query Align frameshifts cnv copies broken_bars
0 1 Scpiz6a_49 64908337 64910441 2105 + exonerate:protein2genome:local gene 683.0 NaN ... NaN NaN NaN NaN NaN NaN NaN CN1 1 (6490.8337, 0.2105)
1 2 Scpiz6a_49 64908337 64908463 127 + exonerate:protein2genome:local cds NaN NaN ... NaN NaN NaN NaN NaN NaN NaN CN1 1 (6490.8337, 0.0127)
2 3 Scpiz6a_49 64908337 64908463 127 + exonerate:protein2genome:local exon NaN NaN ... 0.0 NaN NaN NaN NaN NaN NaN CN1 1 (6490.8337, 0.0127)
3 4 Scpiz6a_49 64908464 64908465 2 + exonerate:protein2genome:local splice5 NaN NaN ... NaN 1.0 GT NaN NaN NaN NaN CN1 1 (6490.8464, 0.0002)
4 5 Scpiz6a_49 64908464 64909578 1115 + exonerate:protein2genome:local intron NaN NaN ... NaN 1.0 NaN NaN NaN NaN NaN CN1 1 (6490.8464, 0.1115)

5 rows × 26 columns

In [176]:
exon_lengths = []
exon_length= None
for row in vmnrCopyNumbers.itertuples():
    if row[8] == 'gene':
        if not exon_length:
            i=1
            exon_length = 0
            continue
        else:
            exon_length = [exon_length]*i
            exon_lengths.extend(exon_length)
            exon_length = 0
            i=0
    if row[8] == "exon":
        exon_length+=row[5]
    i+=1

exon_length = [exon_length]*i
exon_lengths.extend(exon_length)    
    
vmnrCopyNumbers['sum_exon_lengths'] = exon_lengths
vmnrCopyNumbers.head()
Out[176]:
Unnamed: 0 seqnames start end width strand source type score phase ... intron_id splice_site alignment_id Query Align frameshifts cnv copies broken_bars sum_exon_lengths
0 1 Scpiz6a_49 64908337 64910441 2105 + exonerate:protein2genome:local gene 683.0 NaN ... NaN NaN NaN NaN NaN NaN CN1 1 (6490.8337, 0.2105) 990
1 2 Scpiz6a_49 64908337 64908463 127 + exonerate:protein2genome:local cds NaN NaN ... NaN NaN NaN NaN NaN NaN CN1 1 (6490.8337, 0.0127) 990
2 3 Scpiz6a_49 64908337 64908463 127 + exonerate:protein2genome:local exon NaN NaN ... NaN NaN NaN NaN NaN NaN CN1 1 (6490.8337, 0.0127) 990
3 4 Scpiz6a_49 64908464 64908465 2 + exonerate:protein2genome:local splice5 NaN NaN ... 1.0 GT NaN NaN NaN NaN CN1 1 (6490.8464, 0.0002) 990
4 5 Scpiz6a_49 64908464 64909578 1115 + exonerate:protein2genome:local intron NaN NaN ... 1.0 NaN NaN NaN NaN NaN CN1 1 (6490.8464, 0.1115) 990

5 rows × 27 columns

In [177]:
len(vmnrCopyNumbers[(vmnrCopyNumbers.seqnames == 'Scpiz6a_45') & (vmnrCopyNumbers['score']>1000) &(vmnrCopyNumbers['type'] == 'gene')])
Out[177]:
196
In [178]:
vmnrCopyNumbers[(vmnrCopyNumbers.seqnames == 'Scpiz6a_45') & (vmnrCopyNumbers['score']>1000) & (vmnrCopyNumbers['type'] == 'gene')].copies.sum()
Out[178]:
206
In [179]:
len(vmnrCopyNumbers[(vmnrCopyNumbers['type']=='gene') & (vmnrCopyNumbers['score']>1000)])
Out[179]:
205
In [180]:
vmnrCopyNumbers[(vmnrCopyNumbers['type']=='gene') & ((vmnrCopyNumbers['score']>1000))].copies.sum()
Out[180]:
215
In [181]:
len(vmnrCopyNumbers[(vmnrCopyNumbers.seqnames == 'Scpiz6a_45') & (vmnrCopyNumbers['score']>800) &(vmnrCopyNumbers['type'] == 'gene')])
Out[181]:
309
In [182]:
vmnrCopyNumbers[(vmnrCopyNumbers.seqnames == 'Scpiz6a_45') & (vmnrCopyNumbers['score']>800) & (vmnrCopyNumbers['type'] == 'gene')].copies.sum()
Out[182]:
319
In [183]:
len(vmnrCopyNumbers[(vmnrCopyNumbers['type']=='gene') & (vmnrCopyNumbers['score']>800)])
Out[183]:
466
In [184]:
vmnrCopyNumbers[(vmnrCopyNumbers['type']=='gene') & ((vmnrCopyNumbers['score']>800))].copies.sum()
Out[184]:
478
In [185]:
sorted(vmnrCopyNumbers[vmnrCopyNumbers['type']=='exon']['broken_bars'], key = lambda x: x[1])[0]
Out[185]:
(738.6374, 0.0002)
In [186]:
sorted(vmnrCopyNumbers[vmnrCopyNumbers['type']=='gene']['broken_bars'], key = lambda x: x[1])[0]
Out[186]:
(9.7566, 0.0072)
In [187]:
vmnrCopyNumbers.cnv.unique()
Out[187]:
array(['CN1', 'CN3', 'CN4'], dtype=object)

V2R Tree

In [188]:
Image('../fig/single_vomeronasal_proteins alignment FastTree Tree_names.newick.png')
Out[188]:
In [189]:
Image('../fig/single_vomeronasal_proteins alignment FastTree Tree.newick.png')
Out[189]:
In [190]:
%%bash
cp ../fig/single_vomeronasal_proteins alignment FastTree Tree.newick.pdf ../fig2/Figure7C.pdf
cp: target ‘../fig2/Figure7C.pdf’ is not a directory

Embryonic FACs Analysis

In [191]:
%%bash
cp ../fig/14-Jun-2017-Layout2_real_percentages.pdf.pdf ../fig2/Figure5.pdf
In [192]:
Image('../fig/14-Jun-2017-Layout2_real_percentages.jpg')
Out[192]:

Blood FACs analysis

In [193]:
%%bash
cp ../fig/blood_ploidy_1.pdf ../fig2/Figure6A.pdf
cp ../fig/blood_ploidy_2.pdf ../fig2/Figure6B.pdf
In [194]:
Image('../fig/blood_ploidy2.png.png')
Out[194]:
In [195]:
Image('../fig/blood_ploidy1.png')
Out[195]:
In [196]:
%%bash
cp ../data/8449-8450\ FACS.xlsx ../data/supplemental_data/supplemental_table_8.xlsx

Wild Caught Microsatellite Analysis

In [197]:
wildExcelFile = pd.ExcelFile('../data/20180122_Wild_marmoratus_genotyping_results.xlsx')
wildExcelFile.sheet_names
Out[197]:
[u'MOLGN_15379_15380_15381',
 u'MOLGN_15411',
 u'MOLGN_15403_15404',
 u'MOLGN_15396_15397',
 u'MOLGN_15393_15394_15395',
 u'MOLGN_15299']
In [198]:
wildGenotypeDF = pd.concat([wildExcelFile.parse(sheet) for sheet in wildExcelFile.sheet_names], axis=0)
wildGenotypeDF.reset_index(inplace=True, drop=True)
In [199]:
wildGenotypeDF.head()
wildGenotypeDF.columns
Out[199]:
Index([  u'Allele 1',   u'Allele 2',   u'Allele 3',   u'Allele 4',
         u'Allele 5',   u'Comments',   u'Height 1',   u'Height 2',
         u'Height 3',   u'Height 4',   u'Height 5',         u'MS',
        u'Sample ID',     u'Size 1',     u'Size 2',     u'Size 3',
           u'Size 4',     u'Size 5', u'Unnamed: 4', u'Unnamed: 7',
       u'Unnamed: 9',   u'comments'],
      dtype='object')
In [200]:
wildGenotypeDF = wildGenotypeDF.filter(regex='Sample.*|Size.*|Height.*')
wildGenotypeDF['sample_name'] = wildGenotypeDF['Sample ID'].apply(lambda x: x.split('-')[0])
wildGenotypeDF.head()
Out[200]:
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
0 NaN NaN NaN NaN NaN 20545_50-A105-LizardMS NaN NaN NaN NaN NaN 20545_50
1 5001.0 5348.0 NaN NaN NaN 20545_50-Ai5013-LizardMS 213.60 220.39 NaN NaN NaN 20545_50
2 NaN NaN NaN NaN NaN 20545_50-Ai5043-LizardMS NaN NaN NaN NaN NaN 20545_50
3 2724.0 2731.0 NaN NaN NaN 20545_50-Cvanu24-LizardMS 200.00 203.92 NaN NaN NaN 20545_50
4 2936.0 NaN NaN NaN NaN 20545_50-Cvanu7-LizardMS 331.63 NaN NaN NaN NaN 20545_50
In [201]:
for sample in wildGenotypeDF[wildGenotypeDF['Size 3'].notnull()].sample_name.unique():
    display(wildGenotypeDF[wildGenotypeDF.sample_name == sample])
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
1464 NaN NaN NaN NaN NaN TJH3028-A105-LizardMS_F07.fsa NaN NaN NaN NaN NaN TJH3028
1465 9703.0 7719.0 NaN NaN NaN TJH3028-Ai5013-LizardMS_F08.fsa 226.05 255.51 NaN NaN NaN TJH3028
1466 11107.0 NaN NaN NaN NaN TJH3028-Ai5043-LizardMS_F12.fsa 175.63 NaN NaN NaN NaN TJH3028
1467 32656.0 NaN NaN NaN NaN TJH3028-Cvanu24-LizardMS_F06.fsa 199.82 NaN NaN NaN NaN TJH3028
1468 1345.0 4682.0 NaN NaN NaN TJH3028-Cvanu7-LizardMS_F05.fsa 342.13 351.25 NaN NaN NaN TJH3028
1469 359.0 322.0 6870.0 NaN NaN TJH3028-D106-LizardMS_F09.fsa 264.53 285.45 457.81 NaN NaN TJH3028
1470 3881.0 2537.0 NaN NaN not sure about 146 as single peak only TJH3028-D107-LizardMS_F10.fsa 140.76 145.97 NaN NaN NaN TJH3028
1471 NaN NaN NaN NaN NaN TJH3028-D111-LizardMS_F11.fsa NaN NaN NaN NaN NaN TJH3028
1472 30216.0 NaN NaN NaN NaN TJH3028-MS1-LizardMS_F01.fsa 234.96 NaN NaN NaN NaN TJH3028
1473 32588.0 NaN NaN NaN NaN TJH3028-MS6-LizardMS_F02.fsa 176.08 NaN NaN NaN NaN TJH3028
1474 NaN NaN NaN NaN NaN TJH3028-MS7-LizardMS_F03.fsa NaN NaN NaN NaN NaN TJH3028
1475 32086.0 1549.0 NaN NaN NaN TJH3028-MS8-LizardMS_F04.fsa 113.19 138.90 NaN NaN NaN TJH3028
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2712 64.0 39.0 58.0 NaN NaN 104075-A105-LizardMS 219.37 227.77 238.69 NaN NaN 104075
2713 26543.0 22494.0 NaN NaN NaN 104075-Ai5013-LizardMS 217.46 236.64 NaN NaN NaN 104075
2714 4418.0 4625.0 NaN NaN NaN 104075-Ai5043-LizardMS 175.80 181.56 NaN NaN NaN 104075
2715 31654.0 23268.0 NaN NaN NaN 104075-Cvanu24-LizardMS 199.91 203.87 NaN NaN NaN 104075
2716 5253.0 3638.0 NaN NaN NaN 104075-Cvanu7-LizardMS 331.55 334.64 NaN NaN NaN 104075
2717 6457.0 5573.0 NaN NaN NaN 104075-D106-LizardMS 294.97 303.13 NaN NaN NaN 104075
2718 374.0 NaN NaN NaN NaN 104075-D107-LizardMS 144.13 NaN NaN NaN NaN 104075
2719 5985.0 NaN NaN NaN NaN 104075-D111-LizardMS 150.00 NaN NaN NaN NaN 104075
2720 15727.0 12459.0 NaN NaN NaN 104075-MS1-LizardMS 217.79 234.96 NaN NaN NaN 104075
2721 32488.0 NaN NaN NaN NaN 104075-MS6-LizardMS 174.11 NaN NaN NaN NaN 104075
2722 22967.0 5195.0 NaN NaN NaN 104075-MS7-LizardMS 274.52 285.27 NaN NaN NaN 104075
2723 31064.0 NaN NaN NaN NaN 104075-MS8-LizardMS 112.98 NaN NaN NaN NaN 104075
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2736 78.0 109.0 83.0 58.0 NaN 104082-A105-LizardMS 193.93 219.15 235.55 255.87 NaN 104082
2737 32477.0 31434.0 NaN NaN NaN 104082-Ai5013-LizardMS 220.19 236.54 NaN NaN NaN 104082
2738 18103.0 NaN NaN NaN NaN 104082-Ai5043-LizardMS 175.62 NaN NaN NaN NaN 104082
2739 32158.0 25910.0 NaN NaN NaN 104082-Cvanu24-LizardMS 199.91 201.87 NaN NaN NaN 104082
2740 32364.0 NaN NaN NaN NaN 104082-Cvanu7-LizardMS 331.32 NaN NaN NaN NaN 104082
2741 NaN NaN NaN NaN NaN 104082-D106-LizardMS NaN NaN NaN NaN NaN 104082
2742 4094.0 3748.0 NaN NaN NaN 104082-D107-LizardMS 149.80 154.15 NaN NaN NaN 104082
2743 7918.0 5367.0 NaN NaN NaN 104082-D111-LizardMS 150.09 202.98 NaN NaN NaN 104082
2744 15817.0 14636.0 NaN NaN NaN 104082-MS1-LizardMS 216.85 235.03 NaN NaN NaN 104082
2745 32128.0 NaN NaN NaN NaN 104082-MS6-LizardMS 173.95 NaN NaN NaN NaN 104082
2746 32685.0 24497.0 NaN NaN NaN 104082-MS7-LizardMS 227.08 258.10 NaN NaN NaN 104082
2747 31102.0 31552.0 NaN NaN NaN 104082-MS8-LizardMS 113.03 117.37 NaN NaN NaN 104082
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2772 60.0 62.0 NaN NaN NaN 104142-A105-LizardMS 221.68 231.00 NaN NaN NaN 104142
2773 13684.0 24650.0 NaN NaN NaN 104142-Ai5013-LizardMS 224.20 233.49 NaN NaN NaN 104142
2774 1147.0 NaN NaN NaN NaN 104142-Ai5043-LizardMS 175.66 NaN NaN NaN NaN 104142
2775 18467.0 25741.0 NaN NaN NaN 104142-Cvanu24-LizardMS 201.85 203.79 NaN NaN NaN 104142
2776 5604.0 4056.0 2654.0 NaN NaN 104142-Cvanu7-LizardMS 315.33 331.53 334.65 NaN NaN 104142
2777 6119.0 11117.0 NaN NaN NaN 104142-D106-LizardMS 286.83 319.68 NaN NaN NaN 104142
2778 68.0 103.0 68.0 34.0 NaN 104142-D107-LizardMS 139.42 144.03 297.12 300.45 NaN 104142
2779 3001.0 NaN NaN NaN NaN 104142-D111-LizardMS 150.00 NaN NaN NaN NaN 104142
2780 32672.0 NaN NaN NaN NaN 104142-MS1-LizardMS 234.84 NaN NaN NaN NaN 104142
2781 32323.0 NaN NaN NaN NaN 104142-MS6-LizardMS 173.92 NaN NaN NaN NaN 104142
2782 20527.0 7502.0 4529.0 NaN NaN 104142-MS7-LizardMS 227.22 254.25 269.70 NaN NaN 104142
2783 31030.0 19019.0 NaN NaN NaN 104142-MS8-LizardMS 112.19 117.40 NaN NaN NaN 104142
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2784 81.0 86.0 33.0 NaN NaN 104148-A105-LizardMS 215.96 223.75 239.83 NaN NaN 104148
2785 32341.0 31900.0 NaN NaN NaN 104148-Ai5013-LizardMS 217.94 225.87 NaN NaN NaN 104148
2786 15320.0 NaN NaN NaN NaN 104148-Ai5043-LizardMS 178.43 NaN NaN NaN NaN 104148
2787 31667.0 22448.0 NaN NaN NaN 104148-Cvanu24-LizardMS 199.82 201.87 NaN NaN NaN 104148
2788 18418.0 9184.0 NaN NaN NaN 104148-Cvanu7-LizardMS 331.52 343.05 NaN NaN NaN 104148
2789 26369.0 NaN NaN NaN NaN 104148-D106-LizardMS 298.78 NaN NaN NaN NaN 104148
2790 951.0 1260.0 NaN NaN NaN 104148-D107-LizardMS 139.42 140.67 NaN NaN NaN 104148
2791 8774.0 NaN NaN NaN NaN 104148-D111-LizardMS 150.10 NaN NaN NaN NaN 104148
2792 18387.0 NaN NaN NaN NaN 104148-MS1-LizardMS 216.82 NaN NaN NaN NaN 104148
2793 32387.0 NaN NaN NaN NaN 104148-MS6-LizardMS 173.97 NaN NaN NaN NaN 104148
2794 24150.0 19251.0 NaN NaN NaN 104148-MS7-LizardMS 250.33 254.26 NaN NaN NaN 104148
2795 31138.0 NaN NaN NaN NaN 104148-MS8-LizardMS 113.06 NaN NaN NaN NaN 104148
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2796 136.0 106.0 202.0 NaN NaN 104192-A105-LizardMS 201.15 205.06 226.95 NaN NaN 104192
2797 9322.0 8966.0 NaN NaN NaN 104192-Ai5013-LizardMS 241.50 247.02 NaN NaN NaN 104192
2798 2751.0 NaN NaN NaN NaN 104192-Ai5043-LizardMS 181.31 NaN NaN NaN NaN 104192
2799 16578.0 14548.0 NaN NaN NaN 104192-Cvanu24-LizardMS 199.91 201.87 NaN NaN NaN 104192
2800 1693.0 1309.0 NaN NaN NaN 104192-Cvanu7-LizardMS 331.53 338.83 NaN NaN NaN 104192
2801 2721.0 NaN NaN NaN NaN 104192-D106-LizardMS 298.87 NaN NaN NaN NaN 104192
2802 330.0 302.0 NaN NaN NaN 104192-D107-LizardMS 127.67 154.15 NaN NaN NaN 104192
2803 1910.0 NaN NaN NaN NaN 104192-D111-LizardMS 150.00 NaN NaN NaN NaN 104192
2804 8394.0 8372.0 NaN NaN NaN 104192-MS1-LizardMS 216.75 234.91 NaN NaN NaN 104192
2805 13653.0 10479.0 NaN NaN NaN 104192-MS6-LizardMS 172.11 174.08 NaN NaN NaN 104192
2806 13723.0 7639.0 NaN NaN NaN 104192-MS7-LizardMS 219.50 254.22 NaN NaN NaN 104192
2807 9612.0 6378.0 NaN NaN NaN 104192-MS8-LizardMS 117.41 130.13 NaN NaN NaN 104192
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2820 53.0 52.0 58.0 68.0 NaN 104245-A105-LizardMS 193.81 195.76 200.44 217.25 NaN 104245
2821 29737.0 19672.0 NaN NaN NaN 104245-Ai5013-LizardMS 216.28 243.08 NaN NaN NaN 104245
2822 3069.0 2911.0 NaN NaN NaN 104245-Ai5043-LizardMS 178.60 181.50 NaN NaN NaN 104245
2823 32635.0 NaN NaN NaN NaN 104245-Cvanu24-LizardMS 199.74 NaN NaN NaN NaN 104245
2824 16668.0 NaN NaN NaN NaN 104245-Cvanu7-LizardMS 331.71 NaN NaN NaN NaN 104245
2825 19040.0 NaN NaN NaN NaN 104245-D106-LizardMS 307.32 NaN NaN NaN NaN 104245
2826 667.0 572.0 NaN NaN NaN 104245-D107-LizardMS 126.64 157.21 NaN NaN NaN 104245
2827 6616.0 NaN NaN NaN NaN 104245-D111-LizardMS 225.94 NaN NaN NaN NaN 104245
2828 32718.0 NaN NaN NaN NaN 104245-MS1-LizardMS 216.79 NaN NaN NaN NaN 104245
2829 31649.0 18384.0 NaN NaN NaN 104245-MS6-LizardMS 174.15 176.17 NaN NaN NaN 104245
2830 14797.0 19173.0 NaN NaN NaN 104245-MS7-LizardMS 265.88 266.74 NaN NaN NaN 104245
2831 30208.0 21609.0 NaN NaN NaN 104245-MS8-LizardMS 113.29 117.48 NaN NaN NaN 104245
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2832 49.0 82.0 68.0 NaN NaN 104250-A105-LizardMS 198.13 226.43 245.61 NaN NaN 104250
2833 32579.0 27497.0 NaN NaN NaN 104250-Ai5013-LizardMS 224.24 243.35 NaN NaN NaN 104250
2834 945.0 936.0 NaN NaN NaN 104250-Ai5043-LizardMS 175.68 178.56 NaN NaN NaN 104250
2835 31628.0 15008.0 NaN NaN NaN 104250-Cvanu24-LizardMS 203.88 207.85 NaN NaN NaN 104250
2836 23000.0 NaN NaN NaN NaN 104250-Cvanu7-LizardMS 331.52 NaN NaN NaN NaN 104250
2837 NaN NaN NaN NaN NaN 104250-D106-LizardMS NaN NaN NaN NaN NaN 104250
2838 NaN NaN NaN NaN NaN 104250-D107-LizardMS NaN NaN NaN NaN NaN 104250
2839 NaN NaN NaN NaN NaN 104250-D111-LizardMS NaN NaN NaN NaN NaN 104250
2840 18385.0 15933.0 NaN NaN NaN 104250-MS1-LizardMS 216.83 235.00 NaN NaN NaN 104250
2841 32572.0 29951.0 NaN NaN NaN 104250-MS6-LizardMS 157.85 174.31 NaN NaN NaN 104250
2842 26503.0 11294.0 NaN NaN NaN 104250-MS7-LizardMS 258.06 290.12 NaN NaN NaN 104250
2843 30039.0 NaN NaN NaN NaN 104250-MS8-LizardMS 113.15 NaN NaN NaN NaN 104250
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2868 71.0 58.0 54.0 NaN NaN 104423-A105-LizardMS 192.93 233.97 238.74 NaN NaN 104423
2869 13574.0 8755.0 NaN NaN NaN 104423-Ai5013-LizardMS 204.80 240.35 NaN NaN NaN 104423
2870 6048.0 NaN NaN NaN NaN 104423-Ai5043-LizardMS 175.56 NaN NaN NaN NaN 104423
2871 25610.0 NaN NaN NaN NaN 104423-Cvanu24-LizardMS 203.75 NaN NaN NaN NaN 104423
2872 7732.0 5801.0 NaN NaN NaN 104423-Cvanu7-LizardMS 332.51 334.56 NaN NaN NaN 104423
2873 NaN NaN NaN NaN NaN 104423-D106-LizardMS NaN NaN NaN NaN NaN 104423
2874 NaN NaN NaN NaN NaN 104423-D107-LizardMS NaN NaN NaN NaN NaN 104423
2875 5111.0 4661.0 NaN NaN NaN 104423-D111-LizardMS 241.06 249.01 NaN NaN NaN 104423
2876 21010.0 NaN NaN NaN NaN 104423-MS1-LizardMS 234.98 NaN NaN NaN NaN 104423
2877 32584.0 NaN NaN NaN NaN 104423-MS6-LizardMS 171.91 NaN NaN NaN NaN 104423
2878 26582.0 14283.0 NaN NaN NaN 104423-MS7-LizardMS 250.30 269.72 NaN NaN NaN 104423
2879 30847.0 NaN NaN NaN NaN 104423-MS8-LizardMS 110.77 NaN NaN NaN NaN 104423
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2880 118.0 45.0 83.0 76.0 NaN 104426-A105-LizardMS 192.77 207.78 211.79 219.65 NaN 104426
2881 22638.0 21094.0 NaN NaN NaN 104426-Ai5013-LizardMS 212.62 220.58 NaN NaN NaN 104426
2882 13331.0 NaN NaN NaN NaN 104426-Ai5043-LizardMS 175.59 NaN NaN NaN NaN 104426
2883 32510.0 NaN NaN NaN NaN 104426-Cvanu24-LizardMS 203.58 NaN NaN NaN NaN 104426
2884 28604.0 21127.0 NaN NaN NaN 104426-Cvanu7-LizardMS 332.51 334.65 NaN NaN NaN 104426
2885 NaN NaN NaN NaN NaN 104426-D106-LizardMS NaN NaN NaN NaN NaN 104426
2886 99.0 82.0 NaN NaN NaN 104426-D107-LizardMS 269.94 272.94 NaN NaN NaN 104426
2887 2847.0 2594.0 NaN NaN NaN 104426-D111-LizardMS 253.29 269.73 NaN NaN NaN 104426
2888 31870.0 NaN NaN NaN NaN 104426-MS1-LizardMS 235.06 NaN NaN NaN NaN 104426
2889 32565.0 NaN NaN NaN NaN 104426-MS6-LizardMS 171.98 NaN NaN NaN NaN 104426
2890 29906.0 3804.0 NaN NaN NaN 104426-MS7-LizardMS 254.17 285.30 NaN NaN NaN 104426
2891 29604.0 NaN NaN NaN NaN 104426-MS8-LizardMS 110.78 NaN NaN NaN NaN 104426
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2892 48.0 105.0 93.0 153.0 151 104479-A105-LizardMS 201.95 205.23 217.9 234.79 239.57 104479
2893 21389.0 13995.0 NaN NaN NaN 104479-Ai5013-LizardMS 221.16 246.99 NaN NaN NaN 104479
2894 2248.0 NaN NaN NaN NaN 104479-Ai5043-LizardMS 178.51 NaN NaN NaN NaN 104479
2895 32575.0 NaN NaN NaN NaN 104479-Cvanu24-LizardMS 199.73 NaN NaN NaN NaN 104479
2896 5873.0 5266.0 NaN NaN NaN 104479-Cvanu7-LizardMS 331.62 334.65 NaN NaN NaN 104479
2897 NaN NaN NaN NaN NaN 104479-D106-LizardMS NaN NaN NaN NaN NaN 104479
2898 167.0 126.0 NaN NaN NaN 104479-D107-LizardMS 126.51 139.42 NaN NaN NaN 104479
2899 NaN NaN NaN NaN NaN 104479-D111-LizardMS NaN NaN NaN NaN NaN 104479
2900 30043.0 NaN NaN NaN NaN 104479-MS1-LizardMS 234.91 NaN NaN NaN NaN 104479
2901 32562.0 NaN NaN NaN NaN 104479-MS6-LizardMS 174.14 NaN NaN NaN NaN 104479
2902 21475.0 16363.0 NaN NaN NaN 104479-MS7-LizardMS 258.07 261.97 NaN NaN NaN 104479
2903 30969.0 NaN NaN NaN NaN 104479-MS8-LizardMS 113.02 NaN NaN NaN NaN 104479
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2904 50.0 79.0 73.0 NaN NaN 104480-A105-LizardMS 204.15 233.58 238.33 NaN NaN 104480
2905 31844.0 29471.0 NaN NaN NaN 104480-Ai5013-LizardMS 235.37 240.15 NaN NaN NaN 104480
2906 9667.0 9326.0 NaN NaN NaN 104480-Ai5043-LizardMS 175.68 178.54 NaN NaN NaN 104480
2907 32610.0 NaN NaN NaN NaN 104480-Cvanu24-LizardMS 199.73 NaN NaN NaN NaN 104480
2908 17600.0 17465.0 NaN NaN NaN 104480-Cvanu7-LizardMS 317.48 331.55 NaN NaN NaN 104480
2909 15628.0 12419.0 NaN NaN NaN 104480-D106-LizardMS 290.84 319.59 NaN NaN NaN 104480
2910 43.0 NaN NaN NaN NaN 104480-D107-LizardMS 182.48 NaN NaN NaN NaN 104480
2911 14296.0 NaN NaN NaN NaN 104480-D111-LizardMS 150.00 NaN NaN NaN NaN 104480
2912 29710.0 22374.0 NaN NaN NaN 104480-MS1-LizardMS 217.73 235.00 NaN NaN NaN 104480
2913 31625.0 21138.0 NaN NaN NaN 104480-MS6-LizardMS 172.01 174.13 NaN NaN NaN 104480
2914 32724.0 13099.0 NaN NaN NaN 104480-MS7-LizardMS 235.63 261.93 NaN NaN NaN 104480
2915 31883.0 25148.0 NaN NaN NaN 104480-MS8-LizardMS 108.93 117.43 NaN NaN NaN 104480
In [202]:
display(wildGenotypeDF[wildGenotypeDF.sample_name == 'ESP9281'])
Height 1 Height 2 Height 3 Height 4 Height 5 Sample ID Size 1 Size 2 Size 3 Size 4 Size 5 sample_name
2544 NaN NaN NaN NaN NaN ESP9281-A105-LizardMS_A07.fsa NaN NaN NaN NaN NaN ESP9281
2545 26417.0 NaN NaN NaN NaN ESP9281-Ai5013-LizardMS_A08.fsa 253.87 NaN NaN NaN NaN ESP9281
2546 13201.0 12823.0 NaN NaN NaN ESP9281-Ai5043-LizardMS_A12.fsa 173.03 175.87 NaN NaN NaN ESP9281
2547 35701.0 NaN NaN NaN NaN ESP9281-Cvanu24-LizardMS_A06.fsa 200.26 NaN NaN NaN NaN ESP9281
2548 1783.0 5034.0 NaN NaN NaN ESP9281-Cvanu7-LizardMS_A05.fsa 337.95 338.99 NaN NaN NaN ESP9281
2549 11356.0 NaN NaN NaN NaN ESP9281-D106-LizardMS_A09.fsa 398.02 NaN NaN NaN NaN ESP9281
2550 572.0 333.0 NaN NaN NaN ESP9281-D107-LizardMS_A10.fsa 140.65 189.82 NaN NaN NaN ESP9281
2551 3970.0 NaN NaN NaN NaN ESP9281-D111-LizardMS_A11.fsa 150.09 NaN NaN NaN NaN ESP9281
2552 32654.0 NaN NaN NaN NaN ESP9281-MS1-LizardMS_A01.fsa 235.00 NaN NaN NaN NaN ESP9281
2553 32531.0 NaN NaN NaN NaN ESP9281-MS6-LizardMS_A02.fsa 175.99 NaN NaN NaN NaN ESP9281
2554 38004.0 15960.0 NaN NaN NaN ESP9281-MS7-LizardMS_A03.fsa 227.41 286.21 NaN NaN NaN ESP9281
2555 35119.0 NaN NaN NaN NaN ESP9281-MS8-LizardMS_A04.fsa 113.66 NaN NaN NaN NaN ESP9281
In [203]:
def call_float(x):
    try:
        float(x)
        return float(x)
    except ValueError:
        return np.nan

for column in wildGenotypeDF.filter(regex='Height|Size').columns:
    wildGenotypeDF[column] = wildGenotypeDF[column].apply(call_float)

wildGenotypeDF.columns = [c.lower().replace(' ', '_') for c in wildGenotypeDF.columns]
wildGenotypeDF.head()
Out[203]:
height_1 height_2 height_3 height_4 height_5 sample_id size_1 size_2 size_3 size_4 size_5 sample_name
0 NaN NaN NaN NaN NaN 20545_50-A105-LizardMS NaN NaN NaN NaN NaN 20545_50
1 5001.0 5348.0 NaN NaN NaN 20545_50-Ai5013-LizardMS 213.60 220.39 NaN NaN NaN 20545_50
2 NaN NaN NaN NaN NaN 20545_50-Ai5043-LizardMS NaN NaN NaN NaN NaN 20545_50
3 2724.0 2731.0 NaN NaN NaN 20545_50-Cvanu24-LizardMS 200.00 203.92 NaN NaN NaN 20545_50
4 2936.0 NaN NaN NaN NaN 20545_50-Cvanu7-LizardMS 331.63 NaN NaN NaN NaN 20545_50
In [204]:
wildGenotypeDF[~wildGenotypeDF.sample_id.str.contains('A105')][['sample_id', 'size_1', 'size_2', 'height_1', 'height_2']].to_excel('../data/cleanWildCaughtGenotypes.xls')
In [205]:
wildMarmPopulation = bliz.MicrosatellitePopulation('../data/cleanWildCaughtGenotypes.xls', split_id=True)
In [206]:
wildMarmPopulation.calc_internal_relatedness()
In [207]:
wildMarmPopulation.ir_df
Out[207]:
sample_name homozygosity_by_loci internal_relatedness num_hom_loci total_loci
0 ASH95383 1.000000 1.000000 3 3
1 ESP9281 0.945705 0.716907 9 11
2 ASH77912 0.979907 0.693688 8 10
3 ASH79240 0.910078 0.618572 7 10
4 ASH92211 0.924920 0.607237 8 11
5 ASH276 0.897886 0.591083 2 3
6 DED30 0.870799 0.585639 8 11
7 TJH3049 0.943729 0.585429 8 11
8 ASH74810 0.919498 0.579133 6 9
9 ESP9339 0.955960 0.576756 8 11
10 MSB74811 0.943110 0.576405 6 9
11 DJL302 0.969596 0.574503 8 11
12 20545_50 0.727726 0.561240 6 9
13 MSB74845 0.949749 0.550232 6 9
14 ASH257 0.841659 0.535554 6 9
15 ASH95278 0.852358 0.534167 7 10
16 ASH50544 0.878625 0.520383 7 11
17 ASH9319 0.835545 0.493239 6 9
18 ASH3107 0.779146 0.489776 7 11
19 ESP9348 0.879563 0.479330 7 10
20 ASH95229 0.858512 0.469257 6 9
21 TJH3054_1 0.734402 0.457206 7 11
22 MSB74683 0.830661 0.453333 6 9
23 104245 0.602340 0.434585 6 11
24 TJH3052 0.894342 0.433347 7 11
25 DL931 0.810170 0.432749 7 11
26 DED18 0.885275 0.432286 7 11
27 TJH3471 0.894827 0.426000 6 10
28 ASH18536 0.862690 0.419264 4 7
29 TJH2868 0.873408 0.413793 7 11
... ... ... ... ... ...
213 ASH79183 0.722844 -0.045206 3 7
214 ASH207 0.565846 -0.046467 3 11
215 ASH273 0.679125 -0.052038 3 11
216 TJH2805 0.707373 -0.055218 3 11
217 ESP9250 0.617087 -0.055492 3 11
218 DED54 0.688651 -0.060361 3 11
219 TJH2928 0.534696 -0.063050 3 11
220 TJH2874 0.595521 -0.068655 3 9
221 ESP9286 0.641650 -0.071637 3 10
222 ASH1028 0.520457 -0.072709 3 11
223 ASH248 0.562779 -0.080929 2 8
224 TJH2875 0.614083 -0.084542 3 11
225 TJH2923 0.680991 -0.085451 3 10
226 TJH3032 0.634081 -0.086504 3 11
227 ASH95434 0.641563 -0.089494 3 10
228 104077 0.302531 -0.099657 2 10
229 ESP9236 0.664756 -0.100248 3 10
230 MTH543 0.339005 -0.100433 2 10
231 ASH253 0.637554 -0.100766 3 11
232 ASH147 0.562765 -0.103567 3 11
233 ASH193 0.338987 -0.104663 2 9
234 ASH219_50 0.509851 -0.122351 2 10
235 TJH2930 0.612763 -0.124122 3 11
236 ESP9212 0.617907 -0.128924 3 11
237 ASH229 0.494106 -0.136949 2 10
238 ASH250 0.557310 -0.185525 2 10
239 TJH2884 0.545215 -0.198337 2 10
240 GDC5008 0.543326 -0.200414 2 10
241 ASH182 0.528233 -0.211511 1 5
242 ASH252 0.592701 -0.214024 2 8

243 rows × 5 columns

Merge figures

In [208]:
%%bash
cd ../fig2
rm merged_figures.pdf
pdfunite *.pdf merged_figures.pdf

Grouped Figures

In [209]:
marm_ir = marmPopulation.ir_df
sns.set(style='ticks')
plt.figure(figsize=(6.4,1.2))
ax = sns.distplot(marm_ir['internal_relatedness'], kde=False, rug=False, bins=30, color='darkred', hist_kws={'alpha':0.6})
ax.set_ylim((0,8.5))
ax.set_ylabel('Number of Animals', fontsize=minorFontSize)
ax.set_xlabel('Internal Relatedness', fontsize=minorFontSize)
#ax.set_title('Internal Relatedness of 44 Aspidoscelis marmorata')
ax.arrow(.978,2.8,0.0,-0.5, width=0.01, head_width=0.035, head_length=0.25, color='#3498db')
ax.arrow(0.355 ,2.8,0.0,-0.5, width=0.01, head_width=0.035, head_length=0.25, color="#9b59b6")
ax.set_xlim(-0.5,1.1)
ax.set_xticklabels(ax.get_xticks(), fontsize=minorFontSize)
ax.set_yticklabels(ax.get_yticks(), fontsize=minorFontSize)
#0.336735
#ax.spines['top'].set_visible(False)
#ax.spines['right'].set_visible(False)
ax.get_xaxis().tick_bottom()
ax.get_yaxis().tick_left()
fig = ax.get_figure()
fig.savefig('../fig/internal_relatedness.pdf', bbox_inches='tight', pad_inches=0)
In [210]:
fig=plt.figure(figsize=(6.2,2.5))
fig.subplots_adjust(hspace=0.2,wspace=0.40)
gs = gridspec.GridSpec(1, 2)
ax1 = plt.subplot(gs[0, 0])
ax2 = plt.subplot(gs[0, 1])
#############################################################################################
ax1 = nxDF.plot('perc','size',fontsize=minorFontSize, legend=False, linewidth=0.8, ax=ax1)

ax1.get_yaxis().get_major_formatter().set_scientific(False)
ax1.get_yaxis().set_major_formatter(FuncFormatter(lambda x, p: format(int(x/1000000), ',')))
ax1.set_ylabel('Scaffold Size (MB)',fontsize=minorFontSize)


ax1.set_xlabel('Percentage of Total Bases in Genome (%)',fontsize=minorFontSize)
ax1.set_xticks(np.arange(0,120,20))
#ax1.set_title('${Aspidoscelis\ marmoratus}$ Genome Continuity',fontsize=majorFontSize)

ax1.vlines(x=50,color ='r',ymin=0,ymax=90000000,linestyle='--', linewidth=0.8)
ax1.hlines(y=32220929,color ='r',xmin=0,xmax=100,linestyle='--', linewidth=0.8)
ax1.annotate('N50 = 32.22 MB',xy=(15,28000000),fontsize=minorFontSize)

ax1.vlines(x=90,color ='g',ymin=0,ymax=90000000,linestyle='--', linewidth=0.8)
ax1.hlines(y=8340160,color ='g',xmin=0,xmax=100,linestyle='--', linewidth=0.8)
ax1.annotate('N90 = 8.34 MB',xy=(55,4500000),fontsize = minorFontSize)

ax1.get_xaxis().tick_bottom()
ax1.get_yaxis().tick_left()

ax1.text(-0.22, 1.05, 'A', transform=ax1.transAxes,
      fontsize=majorFontSize, fontweight='bold', va='top', ha='right')
#############################################################################################
organism_map = {'Gallus_gallus': 'Chicken', '2_Python_molurus_bivittatus-5': 'Burmese python', 'Aspidoscelis_marmorata': 'Aspidoscelis marmoratus', 
                'Pelodiscus_sinensis': 'Chinese softshell turtle', 'Homo_sapiens': 'Human', 'Canis_familiaris': 'Dog', 'Mus_musculus': 'Mouse', 
                'Danio_rerio': 'Zebrafish', 'Anolis_carolinensis': 'Carolina anole lizard'}

markers=['--o', '--v', '--^', '--<', '-->', '--8', '--s', '--p', '--h', '--H', '--D', '--d']
for i, organism in enumerate(gcSTD.organism.unique()):
    #name = organism.replace('2_','').replace('-5','').replace('_', ' ')
    name = organism_map[organism]
    gc_dist = gcSTD[gcSTD.organism ==organism]
    gc_dist.sort_values('window_size',inplace=True)
    ax2.plot(gc_dist['window_size'],
            gc_dist['std_dev'], 
            markers[i],
            label=name,
            markersize=4,
            linewidth=0.8)
print(organism_map)
ax2.set_xscale("log")

ax2.set_xticks([10] + gc_dist['window_size'].tolist())
ax2.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax2.set_xlim(2770,330000)
xtick_labels = [str(int(float(x)/1000.0)) for x in ax2.get_xticks()]
ax2.set_xticklabels(xtick_labels, fontsize=minorFontSize)

ax2.set_yticks(np.arange(0,11,1.0))
ax2.set_yticklabels(ax2.get_yticks(),fontsize=minorFontSize)

ax2.set_ylim(0,8)
ax2.set_ylabel('% GC Standard Deviation', fontsize=minorFontSize)
ax2.set_xlabel('log(Window Size(kb))', fontsize=minorFontSize)
#ax2.set_title('Standard Deviation of GC Content for Multiple Window Sizes', fontsize=majorFontSize)
ax2.legend(loc=5,bbox_to_anchor=(1.9, 0.5), prop={'size':8})

ax2.get_xaxis().tick_bottom()
ax2.get_yaxis().tick_left()

ax2.text(-0.22, 1.05, 'B', transform=ax2.transAxes,
      fontsize=majorFontSize, fontweight='bold', va='top', ha='right')
#############################################################################################
fig.savefig('../fig/N50_plot_and_GC_std.pdf',bbox_inches='tight',pad_inches=0.1)
fig.savefig('../fig/N50_plot_and_GC_std.png',bbox_inches='tight',pad_inches=0.1)
plt.show()
/home/dut/anaconda3/envs/anaconda2/lib/python2.7/site-packages/ipykernel/__main__.py:41: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
{'Anolis_carolinensis': 'Carolina anole lizard', 'Canis_familiaris': 'Dog', '2_Python_molurus_bivittatus-5': 'Burmese python', 'Gallus_gallus': 'Chicken', 'Mus_musculus': 'Mouse', 'Danio_rerio': 'Zebrafish', 'Aspidoscelis_marmorata': 'Aspidoscelis marmoratus', 'Pelodiscus_sinensis': 'Chinese softshell turtle', 'Homo_sapiens': 'Human'}
In [211]:
x_buffer = 10000
num_clusters = len(hoxGeneCoordsDF.seqid.unique())


sns.set(font_scale=1.0, style='white')
fig=plt.figure(figsize=(6.4,3.5))
fig.subplots_adjust(hspace=0,wspace=0)
gs = gridspec.GridSpec(num_clusters, 1)
base_cluster_length = hoxGeneCoordsDF.stop.max() + x_buffer
for i,scaffold in enumerate(['Scpiz6a_30.1','Scpiz6a_86','Scpiz6a_37','Scpiz6a_1']):
    ax = plt.subplot(gs[i, :])
    ax.set_ylim(0,4)
    #base_cluster_length = hoxGeneCoordsDF[hoxGeneCoordsDF.seqid == scaffold].stop.max() + x_buffer
    ax.set_xlim(-x_buffer, base_cluster_length + x_buffer)
    cluster_patch = coord_to_square(coord = (0,base_cluster_length), level = 2, color='lightgrey', width=0.15)
    ax.add_patch(cluster_patch)
    counter = 0 
    for i, row in hoxGeneCoordsDF[hoxGeneCoordsDF.seqid == scaffold].iterrows():
        if row['strand'] == '+':
            color = 'firebrick'
        else:
            color = 'midnightblue'
        evidence_square = coord_to_square(coord = (row['start'], row['stop']), level=2, color=color, width=0.45)
        ax.add_patch(evidence_square)
        if counter % 2 == 0:
            text_div = 15.0
        else:
            text_div = 1.3
        
        ax.annotate(row['gene'], xy=(row['start'] + ((row['stop'] - row['start']) / text_div),
                                   4.0), ha='left', fontsize=6, rotation=40)
        counter+=1 
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    ax.spines['bottom'].set_visible(False)
    ax.spines['left'].set_visible(False)
    ax.set_yticks([])
    xticks = ax.get_xticks()
    ax.set_xticks([])
ax.spines['bottom'].set_visible(True)
ax.set_xticks(np.arange(0,280000,20000))
x1kb_labels = [str(x/1000) for x in ax.get_xticks()]
ax.set_xticklabels(x1kb_labels,fontsize=minorFontSize,rotation=90)
ax.set_xlabel("scale (KB)")
fig.savefig('../fig/hox_clusters.pdf',bbox_inches='tight',pad_inches=0.1)
fig.savefig('../fig/hox_clusters.png',bbox_inches='tight',pad_inches=0.1,dpi=1000)
In [212]:
sns.set_style("whitegrid")
fig=plt.figure(figsize=(6.4,6.4))
fig.subplots_adjust(hspace=0.3,wspace=0.30)
gs = gridspec.GridSpec(2, 2)
row = [0,0,1,1]
column = [0,1,0,1]
letters = ['A', 'B', 'C', 'D']

for i, animal in enumerate(profCountDF.animal.unique()): 
    ax = plt.subplot(gs[row[i], column[i]])
    outer_lim = 50
    cov_dist = profCountDF[profCountDF.animal == animal][['cov','occurence']].groupby('cov')['occurence'].sum().reset_index()
    cov_mean = sum(cov_dist['cov'] * cov_dist.occurence)/cov_dist.occurence.sum()
    cov_dist['occurence'][cov_dist['cov'] >= outer_lim] = cov_dist['occurence'][cov_dist['cov'] >= outer_lim].sum()
    cov_dist = cov_dist[cov_dist['cov'] <= outer_lim] 

    ax = cov_dist.plot(x='cov',y='occurence',kind='bar',legend=True,ax=ax,  color=color_ids[animal], label=change_name(animal), fontsize=minorFontSize)
    ticks = ax.xaxis.get_ticklocs()
    ticklabels = [l.get_text() for l in ax.xaxis.get_ticklabels()]
    ticklabels[-1] = r'$\geq%s$'%str(outer_lim)
    ax.xaxis.set_ticks(ticks[::5])
    ax.xaxis.set_ticklabels(ticklabels[::5],rotation=90, fontsize=minorFontSize)
    ax.set_ylabel('Number of Sites', fontsize=minorFontSize)
    ax.set_xlabel('Coverage', fontsize=minorFontSize)
    #ax.set_title( 'Animal:'%change_name(animal), fontsize=minorFontSize)
    ax.set_ylim(0,140000000)
    ax.fill_between((cov_mean, ax.get_xlim()[1]),ax.get_ylim()[0],ax.get_ylim()[1],color='grey',alpha=0.2) 
    ax.text(26,100000000,'avg. cov = %s' % str(round(cov_mean,2)), fontsize=minorFontSize)
    ax.text(-0.18, 1.08, letters[i], transform=ax.transAxes,
      fontsize=majorFontSize, fontweight='bold', va='top', ha='right')
fig.savefig('../fig/supplemental_coverage_distributions.pdf',bbox_inches='tight',pad_inches=0.1)
fig.savefig('../fig/supplemental_coverage_distributions.png',bbox_inches='tight',pad_inches=0.1,dpi=500)

IndexErrorTraceback (most recent call last)
<ipython-input-212-b27acee3a0dd> in <module>()
      8 
      9 for i, animal in enumerate(profCountDF.animal.unique()):
---> 10     ax = plt.subplot(gs[row[i], column[i]])
     11     outer_lim = 50
     12     cov_dist = profCountDF[profCountDF.animal == animal][['cov','occurence']].groupby('cov')['occurence'].sum().reset_index()

IndexError: list index out of range
In [ ]:
sns.set_style("whitegrid")
fig=plt.figure(figsize=(6.4,6.4))
fig.subplots_adjust(hspace=0.6,wspace=0.30)
gs = gridspec.GridSpec(2, 2)
row = [0,0,1,1]
column = [0,1,0,1]
covValuesDict = {'A_tigris8450':18,
'Atig_122':18,
'Atig003':18,
'Atig001':16,
}

letters = ['A', 'B', 'C', 'D']


for i, animal in enumerate(tntvDF.animal.unique()):
    ax = plt.subplot(gs[row[i], column[i]])
    ax.text(-0.18, 1.08, letters[i], transform=ax.transAxes,fontsize=majorFontSize, fontweight='bold', va='top', ha='right')
    cov = covValuesDict[animal]
    subed = tntvDF[(tntvDF.animal == animal) & (tntvDF['cov']==cov)&(tntvDF.definable==True)]
    ax = sns.barplot(x = subed.allele_count, y = subed.transitions + subed.transversions, color = "lightblue",ax=ax)

    #Plot 2 - overlay - "bottom" series
    bottom_plot = sns.barplot(x = subed.allele_count, y = subed.transversions, color = "purple",ax=ax)

    #legend
    topbar = plt.Rectangle((0,0),1,1,fc="lightblue", edgecolor = 'none')
    bottombar = plt.Rectangle((0,0),1,1,fc='purple',  edgecolor = 'none')
    l = ax.legend([bottombar, topbar], ['Transversions', 'Transitions'], loc=1, ncol = 1, fontsize=minorFontSize
                  )
    l.draw_frame(False)

    #label bars
    rects = ax.patches

    # Now make some labels
    labels = ['%s'%str(i)[:4] for i,t in zip(subed.tn_tv_ratio, subed.total)]

    for rect, label in zip(rects, labels):
        height = rect.get_height()
        ax.text(rect.get_x() + rect.get_width()/2, height + 25, label, ha='center',rotation=0, va='bottom', fontsize=minorFontSize-1)

    bottom_plot.set_ylabel("Number of Sites In Genome", fontsize=minorFontSize)
    bottom_plot.set_xlabel("Genotype", fontsize=minorFontSize)
    ax.set_xticklabels(ax.get_xticklabels(),rotation=45, fontsize=minorFontSize)
    #plt.xticks(rotation=45)


    ax.set_title('%s Genotypes at %sX'%(change_name(animal), str(cov)), fontsize=minorFontSize)
    ax.ticklabel_format(style='sci',scilimits=(-3,4),axis='y',useOffset=True)
    ax.set_yticks(xrange(0,4500000,500000))

fig.savefig('../fig/supplemental_tntv.pdf',bbox_inches='tight',pad_inches=0.1)
fig.savefig('../fig/supplemental_tntv.png',bbox_inches='tight',pad_inches=0.1,dpi=500)
In [ ]:
sns.set_style('whitegrid')
fig=plt.figure(figsize=(6.4,5.8))
fig.subplots_adjust(hspace=0.9,wspace=0.90)
gs = gridspec.GridSpec(3, 4)
ax1 = plt.subplot(gs[0, 0])
ax1_5 = plt.subplot(gs[0, 1])
ax2 = plt.subplot(gs[0, 2:])
ax3 = plt.subplot(gs[1, 2:])
ax4 = plt.subplot(gs[2, 0:2])
ax5 = plt.subplot(gs[2, 2:])
all_axes = [ax1, ax1_5, ax2, ax3, ax4, ax5]

######################################################################################################################################################

sns.set_style("whitegrid", {'axes.grid' : False})
animal="A_tigris8450"
outer_lim = 50
cov_dist = profCountDF[profCountDF.animal == animal][['cov','occurence']].groupby('cov')['occurence'].sum().reset_index()
cov_mean = sum(cov_dist['cov'] * cov_dist.occurence)/cov_dist.occurence.sum()
cov_dist['occurence'][cov_dist['cov'] >= outer_lim] = cov_dist['occurence'][cov_dist['cov'] >= outer_lim].sum()
cov_dist = cov_dist[cov_dist['cov'] <= outer_lim] 

ax1 = cov_dist.plot(x='cov',y='occurence',kind='bar',legend=True,ax=ax1,  color=color_ids[animal], label=change_name(animal), fontsize=minorFontSize, edgecolor=color_ids[animal])

ticks = ax1.xaxis.get_ticklocs()
ticklabels = [l.get_text() for l in ax1.xaxis.get_ticklabels()]
ticklabels[-1] = r'$\geq%s$'%str(outer_lim)
ax1.xaxis.set_ticks(ticks[::5])
ax1.xaxis.set_ticklabels(ticklabels[::5],rotation=90, fontsize=minorFontSize-2)
ax1.set_ylabel('Number of Sites', fontsize=minorFontSize)
ax1.set_xlabel('Coverage', fontsize=minorFontSize)
#ax1.set_title("Distribution of Coverage: %s" % change_name(animal))
ax1.set_ylim(0,140000000)

#add fill
ax1.fill_between((cov_mean, ax1.get_xlim()[1]),ax1.get_ylim()[0],ax1.get_ylim()[1],color='grey',alpha=0.2) 

#add figure label
ax1.legend(prop={'size':6})

ax1.text(-0.1, 1.3, 'A', transform=ax1.transAxes, fontsize=majorFontSize, fontweight='bold', va='top', ha='right')

####################################################################################################################################################
sns.set_style("whitegrid", {'axes.grid' : False})

animal="Atig_122"
outer_lim = 50
#profCountDF, animal
cov_dist = profCountDF[profCountDF.animal == animal][['cov','occurence']].groupby('cov')['occurence'].sum().reset_index()
cov_mean = sum(cov_dist['cov'] * cov_dist.occurence)/cov_dist.occurence.sum()
cov_dist['occurence'][cov_dist['cov'] >= outer_lim] = cov_dist['occurence'][cov_dist['cov'] >= outer_lim].sum()
cov_dist = cov_dist[cov_dist['cov'] <= outer_lim] 

ax1_5 = cov_dist.plot(x='cov',y='occurence',kind='bar',legend=True,ax=ax1_5,  color=color_ids[animal], label=change_name(animal), fontsize=minorFontSize, edgecolor=color_ids[animal])
ticks = ax1_5.xaxis.get_ticklocs()
ticklabels = [l.get_text() for l in ax1_5.xaxis.get_ticklabels()]
ticklabels[-1] = r'$\geq%s$'%str(outer_lim)
ax1_5.xaxis.set_ticks(ticks[::5])
ax1_5.xaxis.set_ticklabels(ticklabels[::5],rotation=90, fontsize=minorFontSize-1)
ax1_5.set_ylabel('Number of Sites', fontsize=minorFontSize)
ax1_5.set_xlabel('Coverage', fontsize=minorFontSize)
#ax1_5.set_title(change_name(animal))
ax1_5.set_ylim(0,140000000)

#add fill
ax1_5.fill_between((cov_mean, ax1_5.get_xlim()[1]),ax1_5.get_ylim()[0],ax1_5.get_ylim()[1],color='grey',alpha=0.2) 

ax1_5.legend(prop={'size':6})

ax1_5.text(-0.1, 1.3, 'A', transform=ax1.transAxes,
      fontsize=majorFontSize, fontweight='bold', va='top', ha='right')


####################################################################################################################################################
sns.set_style("whitegrid", {'axes.grid' : True})

for animal in perCovEqualRatesDF.animal.unique():
    ax2 = perCovEqualRatesDF[(perCovEqualRatesDF.animal == animal) & (perCovEqualRatesDF['cov'] > 2)].plot(x='cov',y='het_per_10kb',
                                                                                                           ylim=(-0.5,18), xlim = (3.5,100.5), 
                                                                                                           style='--o', yticks = np.arange(0,18,1),
                                                                                                           logy=True,xticks=np.arange(0,102,4),
                                                                                                           label=change_name(animal), ax=ax2,
                                                                                                           color=color_ids[animal], fontsize=minorFontSize, 
                                                                                                           markersize=4, linewidth=0.8)
    ax2.set_ylabel('Equal Het. Sites per 10kb', fontsize=minorFontSize)
    ax2.set_xlabel('Coverage', fontsize=minorFontSize)
    #ax2.set_title('Rate of Even Split Heterozygous Sites vs Coverage', fontsize=minorFontSize)
    ax2.legend(loc=5, bbox_to_anchor=(1.32, 0.5), prop={'size':8})



ax2.set_xticklabels(ax2.get_xticks(),rotation=90, fontsize=minorFontSize-2)
#ax2.yaxis.set_major_formatter(ScalarFormatter())
#ax2.yaxis.set_major_formatter(ticker.FuncFormatter(lambda y,pos: ('{{:.{:1d}f}}'.format(int(np.maximum(-np.log10(y),0)))).format(y)))

ax2.text(-0.1, 1.3, 'B', transform=ax2.transAxes,
      fontsize=majorFontSize, fontweight='bold', va='top', ha='right')

###################################################################################################################################################

animal = 'Atig_122'
cov = 18
subed = tntvDF[(tntvDF.animal == animal) & (tntvDF['cov']==cov)&(tntvDF.definable==True)]
ax3 = sns.barplot(x = subed.allele_count, y = subed.transitions + subed.transversions, color = "lightblue",ax=ax3)

#Plot 2 - overlay - "bottom" series
bottom_plot = sns.barplot(x = subed.allele_count, y = subed.transversions, color = "purple",ax=ax3)

#legend
topbar = plt.Rectangle((0,0),1,1,fc="lightblue", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='purple',  edgecolor = 'none')
l = ax3.legend([bottombar, topbar], ['Transversions', 'Transitions'], loc=1, ncol = 1, fontsize=minorFontSize
              )
l.draw_frame(False)

#label bars
rects = ax3.patches

# Now make some labels
labels = ['%s'%str(i)[:4] for i,t in zip(subed.tn_tv_ratio, subed.total)]

for rect, label in zip(rects, labels):
    height = rect.get_height()
    ax3.text(rect.get_x() + rect.get_width()/2, height + 25, label, ha='center',rotation=0, va='bottom', fontsize=minorFontSize-2)

bottom_plot.set_ylabel("Number of Sites", fontsize=minorFontSize)
bottom_plot.set_xlabel("Genotype", fontsize=minorFontSize)
ax3.set_xticklabels(ax3.get_xticklabels(),rotation=45, fontsize=minorFontSize)
#plt.xticks(rotation=45)


ax3.set_title('%s Genotypes at %sX'%(change_name(animal), str(cov)), fontsize=minorFontSize)
ax3.ticklabel_format(style='sci',scilimits=(-3,4),axis='y',useOffset=True, fontsize=minorFontSize)
ax3.set_yticks(xrange(0,4500000,500000))


ax3.text(-0.1, 1.3, 'D', transform=ax3.transAxes,
      fontsize=majorFontSize, fontweight='bold', va='top', ha='right')

##################################################################################################################################################


animal = 'A_tigris8450'
cov = 18
subed = tntvDF[(tntvDF.animal == animal) & (tntvDF['cov']==cov)&(tntvDF.definable==True)]
ax4= sns.barplot(x = subed.allele_count, y = subed.transitions + subed.transversions, color = "lightblue",ax=ax4)

#Plot 2 - overlay - "bottom" series
bottom_plot = sns.barplot(x = subed.allele_count, y = subed.transversions, color = "purple",ax=ax4)

#legend
topbar = plt.Rectangle((0,0),1,1,fc="lightblue", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='purple',  edgecolor = 'none')
l = ax4.legend([bottombar, topbar], ['Transversions', 'Transitions'], loc=1, ncol = 1, fontsize=minorFontSize
              )
l.draw_frame(False)

#label bars
rects = ax4.patches

# Now make some labels
labels = ['%s'%str(i)[:4] for i,t in zip(subed.tn_tv_ratio, subed.total)]

for rect, label in zip(rects, labels):
    height = rect.get_height()
    ax4.text(rect.get_x() + rect.get_width()/2, height + 25, label, ha='center',rotation=0, va='bottom', fontsize=minorFontSize-2)

bottom_plot.set_ylabel("Number of Sites", fontsize=minorFontSize)
bottom_plot.set_xlabel("Genotype", fontsize=minorFontSize)
ax4.set_xticklabels(ax4.get_xticklabels(),rotation=45, fontsize=minorFontSize)


ax4.set_title('%s Genotypes at %sX'%(change_name(animal), str(cov)), fontsize=minorFontSize)

ax4.ticklabel_format(style='sci',scilimits=(-3,4),axis='y',useOffset=True, fontsize=minorFontSize)
ax4.set_yticks(xrange(0,4500000,500000))


ax4.text(-0.1, 1.3, 'E', transform=ax4.transAxes,
      fontsize=majorFontSize, fontweight='bold', va='top', ha='right')

##################################################################################################################################################

for animal in animal_ids:
    sns.distplot(equalHetWindowsDF[equalHetWindowsDF.animal==animal].rolling(100).mean().het_sites.dropna(),
                 kde=False,
                 hist_kws={"alpha":1,"color":color_ids[animal],"linewidth":2,"histtype": "step"},
                 label=id_to_name[animal],
                 ax=ax5)


#ax5.set_title('1Mb Heterozygosity Distribution', fontsize=minorFontSize)
ax5.set_ylabel('Number of Windows', fontsize=minorFontSize)
ax5.set_xlabel('Mean of Het Sites per 10kb', fontsize=minorFontSize)
ax5.legend(fontsize=minorFontSize)
ax5.ticklabel_format(style='sci',scilimits=(-3,4),axis='y',useOffset=True)

ax5.text(-0.15, 1.3, 'F', transform=ax5.transAxes,
      fontsize=majorFontSize, fontweight='bold', va='top', ha='right')


#####################################################################################################################################################
fig.savefig('../fig/heterozygosity_qc.pdf', bbox_inches='tight', pad_inches=0.2)
fig.savefig('../fig/heterozygosity_qc.png', bbox_inches='tight', pad_inches=0.2, dpi=500)
In [ ]:
sns.set_style("whitegrid")
fig=plt.figure(figsize=(6.4, 3.8))
fig.subplots_adjust(hspace=0.9,wspace=0.25)
gs = gridspec.GridSpec(2, 2)
ax1 = plt.subplot(gs[0, :])
#ax2 = plt.subplot(gs[1, 0])
ax3 = plt.subplot(gs[1, :])
#ax4 = plt.subplot(gs[2, 0])


#######################################################################################################################################################################################
#######################################################################################################################################################################################
#######################################################################################################################################################################################
#######################################################################################################################################################################################

step=500
for animal in equalHetWindowsDF.animal.unique():
    five_mb_rolling = equalHetWindowsDF[equalHetWindowsDF['animal']==animal].reset_index().het_sites.rolling(window=step)

    ax1 = five_mb_rolling.mean().plot(legend=False, 
                                      rot = 90, 
                                      style='.', 
                                      ax=ax1, 
                                      label=id_to_name[animal], 
                                      color=color_ids[animal], 
                                      markersize=2.0,
                                      fontsize=minorFontSize)

    ax1.fill_between(five_mb_rolling.mean().index, 
                     five_mb_rolling.mean()-2*five_mb_rolling.std()/np.sqrt(step), 
                     five_mb_rolling.mean()+2*five_mb_rolling.std()/np.sqrt(step), 
                     color=color_ids[animal], 
                     alpha=0.3)


megabase_labels = [str((x*10000/1000000000)) for x in ax1.get_xticks()]
ax1.set_title('Genome Wide Heterozygosity 5Mb Sliding Window', fontsize=minorFontSize)
ax1.set_xticklabels(megabase_labels, rotation=90, fontsize=minorFontSize) 
ax1.set_xlabel('Window Start Position (Gb)', fontsize=minorFontSize)
ax1.set_ylabel('Avg. Het. Sites / 10kb', fontsize=minorFontSize)
ax1.set_ylim(-0.1,4)
ax1.legend(loc=5, bbox_to_anchor=(1.25, 0.5),markerscale=5,fontsize = minorFontSize)
ax1.text(-0.1, 1.25, 'A', transform=ax1.transAxes,
      fontsize=minorFontSize, fontweight='bold', va='top', ha='right')
#######################################################################################################################################################################################
#######################################################################################################################################################################################
#######################################################################################################################################################################################
#######################################################################################################################################################################################

step=100
scaffold='Scpiz6a_45'
lns = None
for animal in equalHetWindowsDF.animal.unique():
    scaffold_het_rolling = equalHetWindowsDF[(equalHetWindowsDF['animal']==animal) & (equalHetWindowsDF['chrom']==scaffold)].reset_index().het_sites.rolling(window=step)
    ln1 = ax3.plot(scaffold_het_rolling.mean(),
           '.',
            color=color_ids[animal],
            label=id_to_name[animal],
            markersize=2.5)

    ax3.fill_between(scaffold_het_rolling.mean().index, 
                     scaffold_het_rolling.mean()-2*scaffold_het_rolling.std()/np.sqrt(step), 
                     scaffold_het_rolling.mean()+2*scaffold_het_rolling.std()/np.sqrt(step), 
                     color=color_ids[animal], 
                     alpha=0.3)
    if not lns:
        lns = ln1
    else:
        lns+=ln1
    
ax3.set_ylim(-2.7, 5)    
ax3.set_yticks(np.arange(0,5.0,1.0))
ax3.set_yticklabels(ax3.get_yticks(), fontsize=minorFontSize)
ax3.set_ylabel('Avg. Het Sites / 10Kb', fontsize=minorFontSize)


megabase_labels = [str((x*10000/1000000)) for x in ax3.get_xticks()]
ax3.set_xticklabels(megabase_labels,rotation=90, fontsize=minorFontSize) 
ax3.set_xlabel("Window Start Position (Mb)", fontsize=minorFontSize)

ax3.set_title('%dMb Sliding Window Scaffold: %s' % (step*10000/1000000, scaffold), fontsize=minorFontSize)
ax3.set_xlim(scaffold_het_rolling.mean().dropna().index.min(),scaffold_het_rolling.mean().dropna().index.max())

labs = [l.get_label() for l in lns]
ax3.legend(lns, labs, loc=5, bbox_to_anchor=(1.25, 0.5),markerscale=5, fontsize=minorFontSize)
ax3.text(-0.1, 1.25, 'B', transform=ax3.transAxes,
         fontsize=majorFontSize, fontweight='bold', va='top', ha='right')


chrom = ax3.add_collection(BrokenBarHCollection(
        [(scaffold_het_rolling.mean().dropna().index.min()+2, scaffold_het_rolling.mean().dropna().index.max()-105)], 
        (-2.2,1.5),
        facecolors=color_ids['A_tigris8450'],
        alpha=0.25,
        linewidths=[1]
                                      )
    )


ax3.add_collection(BrokenBarHCollection(
        vmnrDF[vmnrDF['feature']=='gene'].broken_bars, 
        (-2.2,1.5),
        facecolors=['black' for i in xrange(len(vmnrCopyNumbers.broken_bars))],
        linewidths=[0.5 for i in xrange(len(vmnrCopyNumbers.broken_bars))]
                                      )
    )



fig.savefig('../fig/genome_wide_heterozygosity.pdf', bbox_inches='tight', pad_inches=0.25)
fig.savefig('../fig/genome_wide_heterozygosity.png', bbox_inches='tight', pad_inches=0.25, dpi=500)

plt.show()
In [ ]:
#head -n 1 A_tigris8450.merged.dedup.realigned.prof > A_tigris8450.merged.dedup.realigned.Scpiz6a_45.prof
#nohup cat A_tigris8450.merged.dedup.realigned.prof | awk '$1=="Scpiz6a_45"' >> A_tigris8450.merged.dedup.realigned.Scpiz6a_45.prof &
#head -n 1 A_tigris8450.merged.dedup.realigned.prof > A_tigris8450.merged.dedup.realigned.Scpiz6a_122.prof
#awk '$1=="Scpiz6a_122"' A_tigris8450.merged.dedup.realigned.prof  >> A_tigris8450.merged.dedup.realigned.Scpiz6a_122.prof &
#head -n 1 A_tigris8450.merged.dedup.realigned.prof > A_tigris8450.merged.dedup.realigned.Scpiz6a_47.prof
#awk '$1=="Scpiz6a_47"' A_tigris8450.merged.dedup.realigned.prof  >> A_tigris8450.merged.dedup.realigned.Scpiz6a_47.prof &
#covScaffold_45_8450 = pd.read_csv('../data/pysam/A_tigris8450.merged.dedup.realigned.Scpiz6a_45.prof',sep='\t')
In [ ]:
# step=1000
# rolling = covScaffold_45_8450.rolling(step,)['cov']

# ax = rolling.std().plot(title='Scaffold 45 8450 Standard Deviation Coverage %sbp Sliding Window' % str(step), 
#                          legend=False, 
#                          rot = 90, 
#                          style='-',
#                          #label=animal,
#                          color='darkblue',
#                          markersize=1.0,
#                          alpha=0.5,
#                          figsize=(20,8),
#                         label='Rolling Average')

# # ax.fill_between(rolling.mean().index, 
# #                     rolling.mean()-2*five_mb_rolling.std()/np.sqrt(step), 
# #                     rolling.mean()+2*five_mb_rolling.std()/np.sqrt(step), 
# #                     color='blue', 
# #                     alpha=0.3)
# print(covScaffold_45_8450['cov'].mean())
# ax.hlines(y=covScaffold_45_8450['cov'].mean(),xmin=ax.get_xlim()[0],xmax=ax.get_xlim()[1],label='Scaffold Average')

# megabase_labels = [str((x/1e6)) for x in ax.get_xticks()]
# ax.set_xticklabels(megabase_labels,rotation=90) 
# ax.set_xlabel('Window Start Position (Mb)')
# ax.set_ylabel('Standard Deviation Coverage')
# ax.set_ylim(-5, 250)
# ax.legend(markerscale=10.0)
# #ax.set_yscale('log')
In [ ]:
step = 500
mother = equalHetWindowsDF[equalHetWindowsDF.animal == 'Atig_122'].reset_index().het_sites.rolling(window=step).mean()
fp_animal = equalHetWindowsDF[equalHetWindowsDF.animal == 'A_tigris8450'].reset_index().het_sites.rolling(window=step).mean()
ratio = mother/fp_animal
print(min(ratio.dropna()),max(ratio.dropna()))
ax = np.log2(ratio).plot(legend=False, 
                rot = 90, 
                style='.',
                label=animal,
                color='black',
                markersize=2.0,
               figsize=(6.4,2),
               fontsize=minorFontSize)
ax.set_title('Ratio of 122:8450 Average Het sites to per 10kb %sMb Sliding Window' % str(step/100), fontsize=minorFontSize)
megabase_labels = [str((x*10000/1000000000)) for x in ax.get_xticks()]
ax.set_xticklabels(megabase_labels,rotation=90, fontsize=minorFontSize) 
ax.set_xlabel('Window Start Position (Gb)', fontsize=minorFontSize)
ax.set_ylabel('log2 ratio', fontsize=minorFontSize)
ax.set_ylim(0,9)
mean_r = np.log2(ratio).mean()
se_r = np.log2(ratio).std()#/np.sqrt(len(ratio.dropna()))
ax.text(1.0e4,0.25, 'average log2(ratio) = %s\nstandard deviation log2(ratio) = %s'%(str(mean_r), str(se_r)), fontsize=minorFontSize)
fig = ax.get_figure()
fig.savefig('../fig/supplemental_log2_ratio.pdf', bbox_inches='tight', pad_inches=0.25)
fig.savefig('../fig/supplemental_log2_ratio.png', bbox_inches='tight', pad_inches=0.25, dpi=500)